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ABSTRACT 

We  analyze  three  queueing  control  problems  that  model  a  dynamic  stochastic  dis- 
tribution system,  where  a  single  capacitated  vehicle  serves  a  finite  number  of  retailers  in  a 
make-to-stock  fashion.  The  objective  in  each  of  these  vehicle  routing  and  inventory  problems 
is  to  minimize  the  long  run  average  inventory  (holding  and  backordering)  and  transporta- 
tion cost.  In  all  three  problems,  the  controller  dynamically  specifies  whether  a  vehicle  at 
the  warehouse  should  idle  or  embark  with  a  full  load.  In  the  first  problem,  the  vehicle  must 
travel  along  a  prespecified  (TSP)  tour  of  all  retailers,  and  the  controller  dynamically  decides 
how  many  units  to  deliver  to  each  retailer.  In  the  second  problem,  the  vehicle  delivers  an 
entire  load  to  one  retailer  (direct  shipping)  and  the  controller  decides  which  retailer  to  visit 
next.  The  third  problem  allows  the  additional  dynamic  choice  between  the  TSP  and  direct 
shipping  options.  By  assuming  that  the  system  operates  under  heavy  traffic  conditions,  we 
approximate  these  queueing  control  problems  by  diffusion  control  problems,  which  are  ex- 
plicitly solved  in  the  fixed  route  problems,  and  numerically  solved  in  the  dynamic  routing 
case.  Simulation  experiments  confirm  that  the  heavy  traffic  approximations  are  quite  accu- 
rate over  a  broad  range  of  problem  parameters.  Our  results  lead  to  some  new  observations 
about  the  behavior  of  this  complex  system. 
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1      Introduction 

A  prototypical  example  of  the  inventory-routing  problem  (IRP)  is  the  challenge  faced  by  a 
large  oil  company  as  it  distributes  gasoline  to  its  various  gas  stations:  several  warehouses, 
which  hold  inventory  of  a  particular  item  (gasoline),  serve  a  set  of  retailers  (stations)  in  a 
make-to-stock  fashion;  arriving  customers  (automobiles)  consume  the  product  at  these  retail 
sites,  and  a  fleet  of  finite  capacity  vehicles  (tanker  trucks)  is  used  to  transport  the  product 
from  the  warehouse  to  the  various  retailers. 

The  management  decisions  involved  in  the  design  and  operation  of  such  a  system 
are  many-fold  and  complex.  Traditionally,  a  hierarchical  decomposition  of  the  problem  is 
used  to  allow  for  a  solvable  model  at  each  of  the  levels  (e.g.,  Simchi-Levi  1992).  At  the 
strategic  level,  the  managers  of  this  system  must  determine  the  location  and  number  of 
warehouses  and  retailers,  as  well  as  the  assignment  of  retailers  to  warehouses.  At  a  tactical 
level,  they  must  decide  on  the  number  of  vehicles  to  operate,  and  possibly  on  the  assignment 
of  vehicles  to  service  districts.  At  the  operational  level  the  decisions  include:  whether  to 
send  a  particular  vehicle  out  or  let  it  idle,  how  much  of  the  capacity  of  the  vehicle  to  use, 
which  of  the  retailers  should  each  vehicle  visit,  and  how  much  of  its  load  should  a  vehicle 
deliver  to  each  of  the  retailers  on  its  route. 

At  the  tactical  and  operational  levels,  the  essence  of  the  IRP  is  the  tradeoff  between 
inventory  costs  and  transportation  costs:  in  order  to  reduce  inventory  levels  at  the  retail 
sites  without  affecting  the  service  level,  more  frequent  replenishment  deliveries  are  required, 
thereby  increasing  the  transportation  costs.  In  many  applications,  customer  demand  (and 
to  a  lesser  extent,  vehicle  travel  times)  is  subject  to  considerable  stochastic  variation.  In 
such  cases,  a  stochastic  model  is  required  to  accurately  capture  the  inventory  costs,  and  in 
this  paper  we  focus  on  the  operational  aspects  of  the  IRP  in  a  dynamic  stochastic  setting. 

The  field  of  operations  research  is  sometimes  criticized  because  real- world  applications 
have  lagged  behind  theoretical  progress  (e.g.,  Ackoff  1987).  The  IRP  is  an  important  coun- 
terexample to  this  perception:  while  heuristics  for  this  notoriously  difficult  problem  have 
led  to  some  spectacularly  successful  industrial  applications  at  both  the  operational  (e.g.,  the 
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Edelman  Prize  winning  work  of  Bell  et  al.  1983,  Golden  et  al.  1984)  and  tactical  (Larson 
1988)  levels,  a  concomitant  mathematical  theory  for  the  IRP  in  a  dynamic  and  stochastic  en- 
vironment has  not  been  forthcoming.  Federgruen  and  Zipkin  (1984)  analyze  a  single-period 
IRP  with  stochastic  demand,  and  Dror  and  Ball  (1987)  develop  a  heuristic  technique  to 
reduce  the  long  run  average  problem  to  a  single  period  problem.  Recent  studies  that  con- 
sider the  operational  aspects  of  the  stochcistic  IRP  include  Trudeau  and  Dror  (1992),  who 
develop  heuristics  for  the  case  of  an  external  supplier,  where  retailer  inventories  are  only 
observable  at  delivery  times;  Minkoff  (1993),  who  constructs  a  decomposition  heuristic  for  a 
Markov  decision  model  that  dispatches  vehicles  on  a  prespecified  set  of  itineraries,  where  each 
itinerary  is  characterized  by  an  inventory  allocation  to  a  subset  of  customers;  and  Kumar, 
Schwarz  and  Ward  (1995),  who  develop  myopic  static  and  dynamic  strategies  for  allocating 
the  contents  of  a  vehicle  to  the  various  retailers  on  a  predetermined  tour.  Chan,  Federgruen 
and  Simchi-Levi's  (1994)  probabilistic  analysis  of  random  instances  of  the  deterministic  IRP 
is  useful  for  addressing  tactical  and  strategic  issues,  but  has  no  bearing  on  the  operational 
aspects  of  the  IRP  with  stochastic  demand. 

Our  system  model  has  one  warehouse  and  one  capacitated  vehicle;  hence,  we  effectively 
assume  that  the  higher  level  decisions  have  been  made  to  assign  a  single  warehouse  and  a 
single  vehicle  to  serve  all  retailers  in  a  particular  region.  An  ample  amount  of  inventory  is 
available  at  the  warehouse,  and  the  cost  of  holding  this  inventory  is  not  included  in  the  model. 
Retailer  demand  and  vehicle  travel  times  are  random,  unsatisfied  demand  is  backordered, 
and  the  objective  is  to  minimize  costs  due  to  holding  and  backordering  inventory  (cost  rates 
may  differ  by  retailer)  and  operating  the  vehicle.  One  of  the  crucial  decisions  in  our  problem 
is  the  vehicle  idling  policy:  when  the  vehicle  is  at  the  warehouse,  the  controller  can  either 
send  the  vehicle  out  with  a  full  load  or  let  the  vehicle  sit  idle;  because  we  employ  a  long  run 
average  cost  criterion,  if  the  vehicle  does  not  idle  and  the  "traffic  intensity"  of  the  system  is 
less  than  one,  then  an  infinite  amount  of  retailer  inventory  will  build  up  over  the  long  run. 

Two  types  of  IRP's  are  analyzed:  the  first  assumes  fixed  routing  and  the  second  allows 
dynamic  routing.  We  consider  two  variants  of  the  fixed  routing  IRP:  in  the  IRP  with  TSP 
routing,  when  a  vehicle  leaves  the  warehouse  it  uses  a  pre-optimized  tour  of  the  m  retailers, 


which  we  refer  to  as  the  TSP  (travelling  salesman  problem)  tour.  In  addition  to  the  vehicle 
idling  policy,  the  controller  must  decide  how  many  units  to  deliver  to  each  retailer,  and  this 
decision  is  based  on  the  current  inventory  levels  at  all  retailers  and  on  the  remaining  number 
of  units  in  the  vehicle.  The  second  variant  is  the  IRP  with  direct  shipping]  in  this  case,  each 
time  the  vehicle  leaves  the  warehouse  it  delivers  all  of  its  contents  to  a  single  retailer,  and 
the  controller  dynamically  specifies  which  retailer  to  visit  next.  In  the  IRP  with  dynamic 
routing,  the  controller  decides,  based  upon  the  current  inventory  levels,  whether  to  use  a 
TSP  tour  or  direct  shipping. 

By  only  allowing  TSP  tours  or  direct  shipping,  we  avoid  an  assault  on  the  combi- 
natorial aspects  of  the  embedded  routing  problem,  and  model  these  problems  as  queueing 
control  problems.  Since  these  control  problems  appear  to  be  analytically  intractable,  heavy 
traffic  analysis  is  employed  in  order  to  make  further  progress.  To  obtain  an  interesting  and 
nontrivial  limiting  control  problem,  we  assume  that  the  server  (vehicle)  must  be  busy  most 
of  the  time  to  meet  average  demand,  the  vehicle  capacity  is  large,  the  tour  completion  time 
is  long  and  nearly  deterministic,  and  the  vehicle  operating  cost  is  large;  these  heavy  traffic 
conditions  are  stated  more  precisely  later  in  the  paper.  Guided  by  the  heavy  traffic  limit 
theorems  in  Coffman,  Puhalskii  and  Reiman  (1995a,b),  we  uncover  a  time  scale  decompo- 
sition in  the  heavy  traffic  limit;  however,  no  weak  convergence  proofs  are  provided  because 
they  would  be  very  demanding  and  would  distract  us  from  our  primary  objectives:  (i)  to 
gain  insights  into  the  nature  of  the  optimal  policies  for  the  IRP,  and  (ii)  to  develop  effective 
pohcies  for  the  operation- of  these  systems.  Under  the  traditional  heavy  traffic  normalization, 
where  time  and  inventory  are  compressed  by  a  factor  of  n  and  y/n,  respectively  (and  n  is 
going  to  infinity),  the  (embedded  or  averaged)  total  retailer  inventory  process  is  well  ap- 
proximated by  a  one-dimensional  reflected  Brownian  motion  on  the  interval  (  —  00,  tf],  where 
the  control  parameter  w  represents  an  aggregate  base  stock  level  that  dictates  the  vehicle 
idling  policy.  The  drift  and  variance  of  the  Brownian  motion  depend  on  whether  the  TSP 
or  direct  shipping  policy  is  being  used,  and  in  the  dynamic  routing  problem  the  controller 
dynamically  switches  between  two  Brownian  motions  (one  for  TSP  and  one  for  direct  ship- 
ping), each  possessing  its  own  drift,  variance  and  cost  structure.  If  we  take  the  heavy  traffic 


normalization  and  slow  down  time  by  a  factor  of  ^/n,  then  a  fluid  limit  is  obtained  on  this 
fcister  time  scale.  A  deterministic  analysis  at  this  time  scale  allows  us  to  optimize  over  the 
other  operational  decisions. 

Our  results  are  quite  explicit;  the  derived  controls  for  the  fixed  routing  IRPs  are  either 
given  in  closed  form  or  in  terms  of  parameters  that  are  solutions  to  certain  equations.  Because 
we  cannot  characterize  the  form  of  the  optimal  policy  for  the  diffusion  control  problem 
associated  with  the  IRP  with  dynamic  routing,  we  analyze  a  cleiss  of  triple  threshold  policies 
that  is  conjectured  to  contain  the  optimal  policy,  and  numerically  compute  the  optimal 
policy  to  the  diffusion  control  problem.  A  computational  study  is  carried  out  that  confirms 
the  accuracy  of  the  heavy  traffic  analysis  and  allows  us  to  obtain  insights  into  the  relative 
importance  of  the  various  operational  decisions  (e.g.,  the  vehicle  idling  policy,  static  vs. 
dynamic  allocation,  TSP  vs.  direct  shipping,  fixed  vs.  dynamic  routing). 

In  §2  and  §3,  we  analyze  the  IRP  with  TSP  routing  and  direct  shipping,  respectively. 
The  performance  of  these  two  routing  schemes  is  compared  in  §4  and  the  IRP  with  dynamic 
routing  is  analyzed  in  §5.  Our  computational  study  is  described  in  §6.  §7  contains  some 
concluding  remarks,  including  a  summary  of  our  key  findings. 

2      The  IRP  with  TSP  Routing 

2.1      Problem  Formulation 

Consider  a  system  where  a  single  vehicle  with  capacity  V  is  used  to  distribute  a  standard 
product  to  m  geographically  dispersed  retailers.  An  inlinitc  supply  of  the  product  is  kept 
at  the  central  warehouse  at  no  cost.  Customers  arc  served  from  the  retailer  inventories  in 
a  make-to-stock  fashion,  and  demand  that  cannot  be  served  immediately  is  backordered. 
When  the  vehicle  is  operating  the  following  policy  is  used:  the  vehicle  leaves  the  warehouse 
(indexed  as  station  0)  with  a  full  load  and  then  visits  all  the  retailers  in  a  predefined  sequence 
before  returning  empty.  Alternatively,  the  vehicle  may  idle  at  the  depot.  Though  the  order 
in  which  retailers  are  visited  could  be  arbitrary,  we  assume  that  it  is  the  solution  to  the 


implied  TSP,  and  refer  henceforth  to  this  service  scheme  as  the  TSP  policy.  Without  loss  of 
generality,  we  assume  that  retailers  are  indexed  from  1  through  m  according  to  their  position 
in  the  TSP  tour. 

Two  sources  of  variability  are  considered:  customer  demand  and  travel  times.  For 
i  =  l,...,m,  customer  demand  at  retailer  i  occurs  according  to  an  independent  renewal 
process  {Di{t)^t  >  0}  with  rate  A,-  and  squared  coefficient  of  variation  c^-  (variance  of  the 
interdemand  time  divided  by  the  square  of  the  mean).  The  cumulative  total  demand  in  [0,  i] 
is  denoted  by  D{t)  =  ^,  Di{t),  and  A  =  ^,  A,  is  the  total  demand  rate.  (In  all  summations 
of  this  paper  the  index  runs  over  the  set  of  retailers  {1,2,...,  m},  unless  explicitly  indicated 
otherwise.)  Our  results  easily  generalize  to  cases  with  correlated  compound  renewal  pro- 
cesses; see  §6  of  Reiman  (1984)  for  details.  The  sequence  of  travel  times  between  facilities 
i  and  j  is  given  by  iid  samples  of  the  random  variable  Tij,  which  has  mean  Oij  and  squared 
coefficient  of  variation  cf^  {i,j  run  from  0  to  m).  These  travel  times  are  independent  of  the 
demand  streams  and  of  each  other.  Keeping  with  the  convention  in  the  literature,  we  assume 
that  pickup  and  delivery  of  units  occur  instantaneously;  in  practice,  load/unload  times  tend 
to  be  dwarfed  by  the  travel  times.  (Although  non-zero  load/unload  times  can  be  incorpo- 
rated in  a  straightforward  manner,  the  analysis  becomes  more  tedious  and  its  inclusion  would 
cloud  the  basic  issues).  Hence,  the  mean  and  variance  of  the  total  time  required  to  complete 
the  TSP  tour  are  given  by  0t  =  E.To'  ^.,.+i  +  ^mo  and  4  =  ZfJ^'  Ol^+Aj+i  +  ^moC^o,  re- 
spectively,  where  the  subscript  "T"  is  mnemonic  for  TSP.  For  later  use,  we  define  the  squared 
coefficient  of  variation  of  the  tour  completion  time  as  Cj  =  s^/^j,  and  let  {^^(i),^  >  0} 
denote  the  counting  process  for  TSP  tour  completions  up  to  time  t  assuming  the  vehicle  is 
continuously  active  in  [0,t]. 

Because  the  route  is  fixed,  only  two  operating  control  decisions  remain:  (i)  whether 
the  vehicle  should  be  busy  or  idle;  (ii)  while  the  vehicle  is  busy,  how  much  of  the  load  to  leave 
at  each  retailer.  The  busy/idle  control  is  expressed  in  terms  of  the  cumulative  process  Bj{t), 
which  represents  the  amount  of  time  the  vehicle  is  busy  in  [0,<].  We  do  not  allow  tours  to 
be  interrupted,  and  so  the  sequence  r^.,  k  =  1,2, .. .  of  tour  completion  epochs  is  given  by 
Tfc  =  ini  {t  I  STiBxit))  >  k}.    The  delivery  allocations  are  modeled  by  the  m-dimensional 


control  process  L,{t),  which  represents  the  cumulative  amount  delivered  to  retailer  i  up  to 
time  t.  In  anticipation  of  future  developments,  let  us  express  this  control  in  terms  of  a 
nominal  delivery  size  for  retailer  i,  denoted  by  V^,  and  a  dynamic  allocation  process  tf{t). 
We  let  Vi  =  A,V/A  for  all  i,  so  that  the  nominal  delivery  size  corresponds  to  allocating 
the  vehicle  capacity  V  among  the  retailers  according  to  their  relative  demands.  The  load 
allocation  process  is  defined  by 

eJit)  =  L,{t)  -  V,ST{BT{t))  for  t  >  0,  (1) 

which  represents  the  cumulative  deviations  from  the  nominal  delivery  size  over  past  tours, 
plus  the  amount  dehvered  during  the  current  cycle  for  retailer  i.  Because  the  tour  completion 
history  can  be  observed,  we  need  only  specify  the  value  of  (.J{t)  to  determine  the  total 
deliveries  to  retailer  i  up  to  time  t.  Notice  that  deviations  from  the  nominal  allocation 
cancel  out  across  the  retailers  and  the  process  er(i)  =  J2i  ^Ji^)  represents  the  total  amount 
delivered  during  the  current  cycle.  Because  we  assume  that  the  vehicle  leaves  the  warehouse 
with  a  full  load  and  returns  empty,  the  dynamic  load  allocation  process  must  satisfy 

ef  (0)  =  0   for  all  i,  (2) 

^i  {^^)  >  ^i  (^~)   only  if  retailer  i  is  visited  at  time  /,  (3) 

ej{t)  >  ef(rjt_i)   for  <G  (r;t_i,rfc)   andall   i,  (4) 

iT{T^)  =  V      and  (5) 

eriTk)  =  0,  (6) 

where  the  superscripts  "-"  and  "+"  denote  the  times  just  before  and  after  an  epoch. 

The  number  of  units  in  inventory  (or  backordered  if  this  quantity  is  negative)  at 
retailer  i  at  time  t  is  denoted  by  Qi{t),  and  the  total  inventory  at  the  retailers  is  Q{t)  = 
XI,  Qi(0-  If  we  assume  that  Q,{0)  =  (J  (wliich  is  without  loss  of  generality,  since  a  long  run 
average  cost  criterion  is  being  used)  then  the  current  inventory  Q,{t)  equals  the  cumulative 
deliveries  minus  the  cumulative  demand,  which  by  (1)  is  given  by 

Q,it)  =  V,STiBT{t))-D,{t)  +  eJ{t)   for   /  =  l,...,m,    t  >  0.  (7) 


Define  the  cumulative  vehicle  idle  time  process  I{t)  by 

I{t)  =  t  -  Brit)  for  t  >  0,  (8) 

so  that  the  control  policy  Bj{t),tJ{t)  must  satisfy 

BT^  e,         are  nonanticipating  with  respect  to  Q,  (9) 

Bt        is  nondecreasing  and  continuous  with  -Sj(O)  =  0,  (10) 

/        is  nondecreasing  with  7(0)  ==0.  (11) 

Our  objective  function  includes  transportation  costs  and  inventory  holding  and  back- 
ordering  costs.  The  travel  cost  rate  per  unit  time,  which  includes  vehicle  depreciation,  fuel 
and  driver  cost,  is  r.  Note  that  these  costs  can  be  combined  because  we  are  ignoring  the 
load/unload  times  (only  the  driver,  but  not  the  vehicle,  is  busy  while  loading  and  unload- 
ing). Inventory  costs  are  assumed  to  be  piecewise-linear,  with  the  holding  cost  rate  (per  unit 
in  inventory  per  unit  time)  at  retailer  i  denoted  by  h^  and  the  backorder  cost  rate  by  6,-. 
Because  travel  costs  are  incurred  whenever  the  vehicle  is  busy,  the  travel  cost  rate  r  can  be 
equivalently  treated  as  a  reward  for  exerting  idleness.  Hence  the  problem  reduces  to  finding 
a  control  poHcy  fB7'(i),  ef  (i)j  to  minimize 


limsup  —E 

T— CO     T 


[  Y.ih^{Q^{t)]''  -f  h,{Q,[t)]-)dt  -  rI{T) 


(12) 


subject  to  (2)  -  (11),  where  the  "-|-"  and  "-"  denote  the  positive  and  negative  parts. 

The  dynamic  stochastic  IRP,  as  formulated  in  (2)  -  (12),  does  not  seem  to  be  tractable. 
Even  under  Markovian  assumptions  for  the  underlying  random  processes,  the  control  space 
is  enormous  and  the  state  space  has  m-\-2  dimensions:  the  inventory/backorder  level  at  each 
retailer  and  the  location  and  total  contents  of  the  vehicle.  To  gain  further  understanding  of 
the  problem,  we  analyze  it  when  the  system  operates  in  the  heavy  traffic  regime. 

2.2      Heavy  Traffic  Normalizations 

We  begin  our  heavy  traffic  development  by  centering  the  service  completion  and  demand 
processes;  define  the  centered  processes  «Sr(i)  =  Sxit)  —  Oj^t  and  T)[t)  =  D[t)  —  Xt.    It  is 


convenient  to  define  the  process 

'^^^^  "  (^  "  ^J  '  "^  V^5T(5r(/))  -  V(t)-  (13) 

we  refer  to  this  quantity  as  the  netput  process,  although  it  does  not  correspond  precisely 
to  the  netput  processes  constructed  in  the  heavy  traffic  analysis  of  conventional  queueing 
networks  (e.g.,  Peterson  1991).  Summing  the  inventory  evolution  equations  (7)  over  all 
retailers  and  substituting  the  relevant  definitions  yields 

■        Q{t}=x{t)-^m  +  eT{t).  (14) 

The  heavy  traffic  approximation  for  the  IRP  may  be  found  as  the  Hmit  (as  n  -^  oo) 
of  a  sequence  of  systems  indexed  by  the  heavy  traffic  parameter  n.  Even  though  no  weak 
convergence  proofs  will  be  undertaken  here,  because  some  of  the  scalings  that  we  introduce 
are  non-traditional,  we  index  quantities  with  n  (in  an  appropriate  place)  to  make  the  scalings 
clear.  This  indexing  will  be  confined  to  this  subsection;  for  the  rest  of  the  paper  we  leave 
off  the  index,  with  the  understanding  that  we  are  considering  a  single  system  that  has  an 
associated  value  of  n.  The  parameter  n  can  be  thought  of  as  a  large  integer  (e.g.,  100)  but 
(as  is  typically  the  case)  the  policy  recommendations  that  emerge  from  our  heavy  traffic 
analysis  are  independent  of  n.  The  parameter  n  is  used  to  normalize  the  various  processes 
according  to  standard  heavy  traffic  conventions  (notice  that  only  the  process  Bj  undergoes 
a  "fluid"  scaling): 


W}-Ht)  =  9i:^     for  alii,  W^-\t)  =  Y,w!'^\t)  =  9^,  (15) 


^/^  '  y    '  y^    ' 


\/n  yjn  yjn 


(16) 


mi)  =  V^,  5n<)  =  ™      and      4"'(')=^£^. 


(17) 

The  processes  ly'"'  and  V'*"'  represent  the  normalized  inventory  and  idleness,  respectively;  to 
reduce  the  amount  of  notation,  the  normalized  versions  of  the  remaining  processes  contain 
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a  "hat".    Employing  these  scalings  in  (13)  and  (14)  yields  expressions  for  the  uormalized 
netput  process 


'T 

and  the  normalized  inventory  process 


y(n) 


H/(")(0  =  x^-\t)  -  -^Y(-\t)  +  e?)(0.  (19) 

To  obtain  a  nontrivial  control  problem  in  heavy  traffic,  we  normalize  the  system 
parameters  in  a  particular  fashion.  The  demand  and  inventory  cost  parameters  are  not 
scaled,  and  the  other  parameters  are  normalized  as  follows: 

F^"'    =    ^^,  (20) 

n{n) 

op    =    %.  (21) 


/'r 


(n) 


/T/(n)  \ 

^\^J^-  ^*"'     >  0'  (22) 


B 


(n) 
T 


4(n)    =    ^f^c\[n),  (23) 

r'"'     =     — .  (24) 

n 

We  assume  that  all  the  quantities  on  the  left  side  of  definitions  (20)-(24)  converge  to  finite 

and  positive  hmits  as  n  ^  oo.  Equations  (20)-(24)  are  the  iieavy  traffic  conditions  and  they 

specify,  in  a  unified  manner  via  the  heavy  traffic  parameter  n,  the  relative  magnitudes  of 

the  various  system  parameters.    These  conditions  are  more  extensive  than  those  enforced 

in  traditional  queueing  systems,  and  therefore  warrant  some  discussion.    Since  the  natural 

definition  of  the  traffic  intensity  is  pj  =  X9j-/V,  condition  (22)  is  the  "traditional"  heavy 

traffic  condition,  which  requires  that  pj  be  close  to,  but  less  than,  unity. 

Now  we  turn  to  conditions  (20)-(21).  Because  the  state  space  is  compressed  by  a  factor 

of  y/n  in  the  heavy  traffic  normalization,  the  vehicle  capacity  in  terms  of  scaled  inventory 

units  is  V*"'/\/"'-    Hence,  if  V*"'  was  0(1)  it  would  vanish  in  the  limit,  and  our  system 

would  reduce  to  a  variant  of  the  multiclass  make-to-stock  queue  analyzed  in  Wein  (1992). 

Although  such  a  model  would  be  tractable,  a  limit  that  employs  infinitesimal  vehicle  sizes 
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fails  to  capture  the  essence  of  the  behavior  of  the  original  system.  Therefore,  we  enforce 
condition  (20),  so  that  V'*"'  is  0[yjn)  and  the  bulkiness  of  the  retailer  deliveries  is  retained 
in  the  limit.  However,  since  the  demand  rate  A^"'  is  unsealed,  we  need  to  also  scale  the  tour 
lengths  according  to  (21)  to  ensure  that  the  ratio  K^"V^r  converges  to  a  finite  and  positive 
limit. 

Turning  to  (2.3),  note  that  since  the  vehicle  capacity  is  0(\/n),  if  the  Cj(n)  is  not 
scaled  then  a  standard  calculation  shows  that  the  variance  term  for  the  normalized  netput 
process  x^"'  is  0{n)  and  hence  approaches  infinity  in  the  heavy  traffic  limit.  Since  5j(n)  = 
c^{n)6j{n),  by  (21 )  and  (23)  we  obtain  Sj'{n)  =  ^/nsj[n),  where  Sj-{n)  =  9j'C^{n).  Thus,  by 
enforcing  condition  (23),  we  assume  that  the  variance  of  the  tour  completion  time  is  0{y/n); 
in  contrast,  this  quantity  would  be  0{n}  if  travel  times  were  simply  multiplied  by  y/n.  One 
way  to  achieve  (23)  is  to  assume  that  the  travel  time  of  the  tour  is  the  sum  of  y/n  iid  finite 
variance  travel  times.  This  construction  could  arise  by  superimposing  the  warehouse  and 
retailer  locations  on  a  two-dimensional  map  with  a  fine  grid,  in  such  a  way  that  the  tour 
passes  through  approximately  y/n.  grid  points.  However,  this  modeling  artifice  is  problematic 
(because  adjacent  travel  times  would  not  likely  be  independent  and  the  necessary  data  would 
be  tedious  to  collect)  and  is  not  pursued  here;  see  Rubio  (1995)  for  further  details. 

Finally,  we  need  to  normalize  the  cost  parameters  to  account  for  distortions  in  the 
relative  magnitudes  of  the  transportation  and  inventory  costs  that  result  from  the  heavy 
traffic  scaling.  The  appropriate  scaling  is  to  allow  the  travel  cost  rate  r^"'  to  be  approximately 
n  times  larger  than  the  inventory  cost  rates,  as  in  condition  (24);  see  Rubio  for  a  detailed 
explanation. 

In  summary,  the  heavy  traffic  conditions  assume  that  the  vehicle  must  be  busy  the 
great  majority  of  the  time  to  meet  average  demand,  the  vehicle  capacity  must  be  large,  the 
tour  completion  time  must  be  large  and  nearly  deterministic,  and  the  travel  cost  rate  must 
be  very  large  relative  to  the  inventory  cost  rates.  The  computational  study  in  §6  reveals 
that  our  results  are  rather  insensitive  to  these  conditions. 
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2.3      System  Behavior  in  Heavy  Traffic 

This  subsection  considers  the  limiting  behavior  of  equations  (18)-(19).  Following  Harrison 
(1988),  we  replace  Bjit),  which  is  the  fluid  scaled  busy  time  process,  by  pjt;  the  justification 
for  this  substitution  is  that  any  pohcy  that  does  not  utilize  the  vehicle  for  a  fraction  pj  of 
the  time  over  a  sufficiently  long  time  interval  will  generate  extremely  large  inventory  costs. 
In  addition,  we  consider  the  normalized  netput  process  embedded  at  tour  completion  epochs. 
Without  some  embedding  or  averaging  the  limit  of  the  normalized  netput  process  would  not 
exist  because  it  varies  (after  normalization)  by  0(1)  on  a  time  of  length  0(l/>/n).  The 
process  we  consider  is  thus  defined  as 

X{t)  =  ^{-0--^)  ^^-^  +  y^TipTTk-i)  -  'DiT,_,)  for  t  €  [n-uTk)  . 

With  this  definition  the  standard  tools  of  weak  convergence  (the  functional  central  hmit 
theorem  for  renewal  processes,  the  random  time  change  theorem  and  the  continuous  mapping 
theorem;  see  Billingsley  1968)  can  be  used  to  show  that  the  normahzed  netput  process 
embedded  at  tour  completion  epochs  x  is  well  approximated  by  a  Brownian  motion  X  with 
drift  pt  and  variance  cry  =  A(c^  +  Vc^-). 

Now  we  turn  our  attention  to  the  process  ey.  This  process  equals  zero  at  tour  com- 
pletion epochs  and  has  jumps  of  size  0(1)  whenever  a  delivery  occurs  and  at  the  end  of  the 
cycle.  In  addition,  because  the  tour  length  is  0{\/n)  by  (21),  a  tour  takes  only  0(l/\/n) 
time  units  under  the  heavy  traffic  normahzation  (where  time  is  compressed  by  the  factor 
n);  hence,  tours  occur  instantaneously  in  the  heavy  traffic  hmit.  Consequently,  neither  tj{t) 
nor  the  m— dimensional  normalized  inventory  process  converge  to  a  limit  in  the  usual  sense. 
However,  if  we  start  with  the  heavy  traffic  normalization  and  expand  time  by  a  factor  of  y/n, 
then  a  fluid  scaling  is  obtained,  where  both  time  and  space  are  compressed  by  the  factor 
\/n.  At  this  faster  time  scale,  the  Brownian  motion  X  remains  constant,  and  the  individ- 
ual inventory  levels  move  in  a  deterministic  fashion,  decreasing  at  a  finite  rate  between  the 
jumps  at  delivery  epochs.  The  process  tj  traverses  through  many  tours  before  X  changes 
value,  and  equals  zero  at  each  tour  completion  epoch. 

This  is  similar  to  the  state  of  affairs  in  the  heavy  traffic  results  of  CofFman,  Puhalskii 
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and  Reiriiaii.  In  their  exhaustive  polHng  system,  the  total  queue  length  process  behaves  as  a 
one-dimensional  diffusion  under  the  slow  time  scale  associated  with  the  heavy  traffic  scaling, 
and  the  individual  queues  move  as  a  fluid  under  the  faster  time  scale  associated  with  the 
fluid  limit.  This  time  scale  decomposition  gives  rise  to  a  heavy  traffic  averaging  principle 
(HTAP)  that  implies  the  following:  for  purposes  of  calculating  performance  measures  for  the 
individual  queues,  one  can  analyze  the  deterministic  fluid  cycle  for  each  fixed  value  of  the 
diffusion  process.  There  are  four  key  differences  between  the  IITAP  in  the  polling  system 
and  in  the  IRP.  First,  the  fluid  trajectories  are  different.  In  the  polling  problem,  the  fluid 
paths  look  like  those  for  the  economic  production  quantity  (EPQ)  model:  they  go  up  and 
down  at  a  finite  rate.  In  the  IRP,  the  paths  look  like  those  from  the  economic  order  quantity 
(EOQ)  model:  they  go  down  at  finite  rate  but  go  up  in  jumps  at  delivery  epochs.  The 
second  key  difference  relates  to  the  issue  of  "control".  In  the  polling  system  the  exhaustive 
discipline  guarantees  that  whenever  the  server  switches  from  a  queue,  that  queue  is  empty. 
This  exerts  a  type  of  control  that  keeps  the  multidimensional  process  well  behaved.  There 
is  no  such  natural  mechanism  in  the  IRP;  we  must  introduce  a  dynamic  allocation  scheme 
to  keep  the  multidimensional  process  well  behaved.  (This  is  done  below.)  Third,  the  time 
scale  decomposition  in  the  polling  system  emerged  as  a  consequence  of  the  standard  heavy 
traffic  normalization,  whereas  in  the  IRP  it  follows  from  the  scaling  assumptions  (20)-(23). 
This  gives  rise  to  the  fourth  key  difference  —  the  proof  of  the  HTAP.  In  the  polling  context 
a  difficult  proof  involving  a  threshold  queue  was  needed.  For  the  IRP  the  proof  follows 
from  assumptions  (20)-(23)  and  the  properties  of  the  dynamic  control  (we  do  not,  however, 
provide  the  proof). 

2.4      The  Limiting  Control  Problem 

As  described  above,  the  analysis  of  our  limiting  control  problem  decomposes  onto  two  time 
scales.  On  the  slow  time  scale  associated  with  the  diffusion  limit,  we  can  average  out  the 
effects  of  the  controlled  allocation  process  er  and  choose  the  vehicle  idling  policy.  This  policy 
is  generated  by  the  normalized  cumulative  idleness  )  .  which  we  as.sume  is  nondecreasing  and 
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right  continuous.  Let  Z{t)  =  X{t)  —  g-Y{t);  this  is  the  process  that  would  be  obtained  if 
one  were  to  observe  the  total  inventory  only  at  tour  completion  epochs.  We  refer  to  this 
process  as  the  total  embedded  inventory  process,  to  differentiate  it  from  W. 

At  the  faster  time  scale  where  the  total  embedded  inventory  is  fixed  at  Z{t)  =  x  and 
the  individual  inventories  behave  as  a  fluid,  we  must  find  the  optimal  allocation  pohcy  that 
minimizes  inventory  costs  per  unit  time.  The  limit  cycle  associated  with  an  allocation  policy 
can  be  viewed  as  a  closed  m— dimensional  path,  and  the  optimal  allocation  policy  reduces 
to  the  problem  of  optimally  placing  a  deterministic  cycle  in  R™.  Let  g{x)  represent  the 
inventory  cost  per  unit  time  that  is  achieved  by  optimally  locating  a  cycle  when  Z{t)  =  x. 

We  can  now  state  the  limiting  stochastic  control  problem  for  the  IRP  with  TSP 
routing:  (i)  find  the  optimal  cycle  placement  for  a  given  total  embedded  inventory  level 
Z{t)  =  X,  and  its  corresponding  inventory  cost  rate  g{x);  and  (ii)  choose  the  nondecreasing 
right  continuous  process  Y  to  minimize 


/    9iZit))dt-rY(T) 
Jo 


(25) 


limsup  —E 

T— ►oo      J 

V 

subject   to      Z{t)  =  X{t)  -  — y'(0-  (26) 

The  cycle  placement  problem  is  a  nonhnear  program  and  problem  (25)- (26)  is  a  singular 
control  problem  for  Brownian  motion;  these  two  problems  are  solved  in  the  next  two  sub- 
sections. 


2.5      Optimal  Cycle  Placement  and  Dynamic  Allocation 

To  optimally  place  the  limit  cycle,  we  follow  the  approach  used  in  Markowitz,  Reiman  and 
Wein  (1995)  for  the  stochastic  economic  lot  scheduling  problem  (ELSP).  Let  us  fix  Z{t)  =  x, 
and  denote  the  individual  fluid  inventory  levels  by  Wi{t)  =  Qi{y/nt)l s/n  (a  "bar"  will  be 
used  to  denote  quantities  introduced  for  the  fluid  limit).  The  cycle  placement  can  be  defined 
in  many  ways  and  we  choose  to  specify  it  by  the  vector  (xi,  X2, . . . ,  x^),  where  x^  represents 
the  lowest  point  during  the  cycle  of  Wi[t). 

The  choice  of  optimal  (.Tj,  . . . ,  Xm)  is  a  constrained  optimization  problem:  we  want  to 
choose  (xj, . . . ,  Xm)  to  minimize  the  inventory  cost  rate  subject  to  consistency  with  the  total 

13 


WAt) 


-^^^, 


0 


A 


Figure  1:  Fluid  inventory  evolution  at  retailer  i  during  a  nominal  allocation  cycle. 


embedded  inventory  level.  The  inventory  cost  rate  will  come  from  the  averaging  principle  to 
be  described  below.  We  first  deal  with  the  consistency  issue. 

To  establish  the  relationship  between  the  cycle  placement  variables  i,  and  the  total 
embedded  inventory  level  x,  we  need  to  introduce  some  new  notation.  Denote  the  mean 
travel  time  aiong'  the  TSP  path  between  any  two  sites  i,j  G  0, 1, . . .  ,m  by  9j]^^;  in  terms 
of  the  inter-site  mean  travel  times  6,j,  these  quantities  are  defined  by  Of^^^  =  YliZi  ^k.k+i 
for  j  >  i  and  Of-^^  =  0.  Since  time  is  compressed  by  a  factor  of  ^yn  in  the  fluid  limit, 
define  the  corresponding  travel  times  for  the  fluid  model  by  Ofj^^  =  Oj^^^ /y/n.  If  we  measure 
time  over  a  cycle  so  that  the  vehicle  leaves  the  warehouse  at  i  =  0,  then  VV,(0)  is  related 
to  its  corresponding  cycle  placement  value  x,  by  (see  Figure  1)  VV',(0)  =  x,  +  ^(dof^  for 
z  =  1, . . . ,  m.  Summing  these  inventory  levels  over  all  retailers,  we  obtain 

^x.  =  x-X:A.^T/^  (27) 

t  i 

Given  a  vector  (xi, . . .  ,Xm)  satisfying  (27),  we  want  to  determine  the  associated  in- 
ventory cost  rate.  The  bcisic  intuition  of  the  time  scale  decomposition  appears  to  provide  the 
solution.  Long  term  stability  requires  that,  in  the  long  run,  the  average  amount  deUvered 
to  retailer  i  per  cycle  be  Vi  =  A,K/A;  under  the  diffusion  and  fluid  scalings,  this  delivery 
size  is  given  by  V,  =  Vi/y/n.    Viewing  the  fluid  inventory  of  retailer  i  in  isolation  with  a 
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delivery  of  Vi  on  each  tour  cycle,  we  see  a  fluid  starting  out  at  Xi  +  V^  immediately  after 
delivery,  decreasing  at  a  constant  rate  until  a-,  is  reached  just  prior  to  the  next  delivery  (see 
Figure  1).  The  inventory  cost  component  associated  with  retailer  i  can  then  be  calculated  by 
considering  the  normalized  inventory  process  VF,  to  be  uniformly  distributed  on  the  interval 
[xi,  Xi-\-Vi].  Although  this  approach  provides  the  correct  inventory  cost  rate,  it  turns  out  that 
a  dynamic  (state-dependent)  delivery  size  is  needed  to  keep  the  long  run  average  cost  finite. 
Simply  delivering  Vi  to  retailer  i  on  every  visit  will  result  in  an  infinite  long  run  average  cost 
because  this  allocation  leads  to  a  null  recurrent  process.  To  see  this,  note  that  under  this 
simple  allocation  scheme,  the  drift  of  Wi{t)  does  not  depend  on  Wi{t)  and  equals  zero,  since 
the  inventory  "arrival"  rate  VJOf  equals  the  demand  rate  A^  when  pj  =  I]  with  /^j  <  1  a 
similar  result  is  generated  with  the  effective  arrival  rate  being  prVilOj.  With  a  zero  drift, 
central  limit  theorem  arguments  indicate  that  the  inventory  or  backlog  will  grow  as  \/t.  In 
fact,  similar  arguments  can  be  used  to  explain  some  numerical  results  in  Federgruen  and 
Katalan  (1994)  and  Wein,  where  a  state-independent  policy  performs  poorly  in  a  stochastic 
setting. 

A  simple  dynamic  allocation  policy  escapes  this  difficulty.  We  determine  delivery  sizes 
at  the  warehouse  as  follows.   Given  a  fluid  inventory  level  [w\^ . . .  ,to„)  when  the  vehicle  is 

—TSP 

at  the  warehouse,  the  fluid  limit  of  the  inventory  immediately  before  delivery  is  Wi  —  XiOQ^  . 
If  possible,  we  would  like  to  deliver  (/,  =  Xi  +  Vi  —  Wi  -f  A,^^"^^  to  retailer  i,  in  order  to  bring 
the  fluid  inventory  level  immediately  after  dehvery  to  x,  +  V,.  li  d,  >  0  for  ?'  =  1, . . .  ,m 
then  this  delivery  allocation  is  feasible.  If  d^  <  0  for  some  i,  then,  since  Yli  di  =  V,  we  must 
have      Yl     d^  >  V.   This  is  a  transient  state  for  the  fluid  limit;  within  a  finite  number  of 

{i:d,>0} 

cycles  we  will  have  d,  >  0  for  all  i.  This  transient  interval  has  no  effect  on  the  long  run 
average  inventory  cost,  and  an  averaging  principle  will  hold  under  this  dynamic  allocation 
scheme.  In  summary,  the  essence  of  the  averaging  principle  here  is  that,  under  this  dynamic 
allocation  scheme,  when  Z{t)  =  x,  W{{t)  can  be  treated  as  if  it  is  uniformly  distributed 
between  Xi  and  Xi  +  Vi. 

The  average  inventory  cost  per  unit  time  is  equal  to  the  cost  incurred  over  a  cycle 
divided  by  the  corresponding  cycle  length.  The  cost  at  retailer  i  may  be  obtained  by  simple 
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geometric  arguments  for  any  cycle  placement  i^.  When  the  cycle  placement  is  sufficiently 
high  (low)  so  that  the  inventory  remains  positive  (negative)  for  the  duration  of  the  cycle,  the 
cost  is  simply  the  holding  (backordering)  rate  multiplied  by  the  absolute  value  of  i,  +  K72, 
which  is  the  average  inventory  level  over  a  cycle.  When  the  inventory  changes  sign  during  the 
cycle  the  total  holding  (backordering)  cost  over  a  cycle  equals  the  area  of  one  of  the  triangles 
above  (below)  the  time  axis  multiplied  by  /t,  (6,).  To  obtain  the  time  average  inventory  cost 
when  there  is  a  sign  change  we  sum  the  areas  of  these  two  triangles  and  divide  by  the  cycle 
length.  In  the  heavy  traffic  limit,  the  amount  of  fluid  delivered  per  cycle,  V,  equals  the 
amount  demanded  per  cycle,  which  is  X6t]  hence,  we  set  the  cycle  length  in  the  fluid  model 
equal  to  V/X,  rather  than  Oj.  In  summary,  we  have  the  following  expression  for  retailer  i: 

hi{xi  +  ^)  iixi>0 


9i{xi)  =  < 


!^x]  +  h,x,  +  f^    \i-V,<x,<0    .  (28) 


2V, 


6.(.r.  +  ^)  ifx,  <-V; 


Notice  that  S',(x,)  is  a  convex  function  of  x,.  With  equation  (28)  in  hand,  the  cycle  placement 
problem  is  to  minimize  H,  ^,(x,)  subject  to  (27). 

Let  us  make  the  innocuous  assumption  that  b,  >  h,  for  all  i,  and  define  the  labeling 
conventions  he  =  h  =  min,  hi  and  bp  =  b  =  min,  6;,  where  £  =  p  \s  allowed.  A  closed- 
form  solution  to  the  cycle  placement  problem  is  found  by  using  constraint  (27)  to  turn  the 
problem  into  one  of  unconstrained  optimization  over  rn  —  1  variables;  readers  are  referred  to 
an  analogous  optimization  in  Markowitz,  Reiman  and  Wcin  for  further  details.  The  solution 
yields  the  vector  of  optimal  placements  x'  and  g{x),  the  inventory  cost  as  a  function  of  the 
total  embedded  inventory  x.  Not  surprisingly,  g{x)  is  quadratic  with  linear  edges  in  the 
inventory  level  x. 
Proposition  1.     The  solution  to  the  cycle  placement  problem  is 

Region  1:  x<ar  =  j:  X^''  -  L  ^K, 

i  i    0,  +  hi 

X,     =     -  Vi  for  I  ^p, 

Oi  +  hi 
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^v 


g{x)     =     -bx  +  ai, 


nTSP  ,    Iv-.tV       l^{b  +  h,Y 


Region  2:  aT<x<h  =  Y.  >^^~^l''^  '  E  r^^' 

i  i    0,  +  h. 


X:        = 


«2     =     7,\1^ 


2  V   ,     h^  +  /?, 


03 


04 


%r 


bih,Vj 
bi  -\-  hi 


Region  3:  x  >  /Sj, 

x]  =  -r^nr^  Z^''  ^^  ^' 

Oj  +  ft, 

g[x)     =     /ix  +  a,5, 

a^  =  -hTx.'eir  +  -y.hM  -  -E  ^!'  ~  f^'^- 

Z^    ^  Oi      ^  2  ^    '   '      2  ^    6,  +  /i, 

In  the  region  where  the  total  inventory  is  much  greater  (smaller)  than  zero,  the  optimal 
cycle  holds  (backorders)  most  of  the  inventory  at  retailer  i  (p),  where  it  is  cheapest  to  do 
so,  while  the  cycle  for  the  rest  of  the  retailers  remains  close  to  zero.  The  exact  level  for 
each  site  depends  upon  two  factors:  the  difference  between  its  holding  (backordering)  cost 
h  (6),  and  its  nominal  delivery  size  (or  equivalently,  the  proportion  of  demand  that  the 
particular  retailer  represents).  In  the  region  where  the  total  inventory  is  close  to  zero,  the 
cycle  placement  at  each  retailer  varies  linearly  with  the  embedded  inventory  in  the  system. 
It  is  worth  noting  that,  for  the  symmetric  cost  case  (i.e.,  when  hi  =  h  and  b,  —  b  for 
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all  2  =  1, ... ,  m),  the  more  natural  solution  of 


{x-ZkhO^r)      if    &T<x<^T 


is  also  optimal. 


2.6      Optimal  Base  Stock  Level 

Now  that  g{x)  is  known,  we  can  proceed  with  the  solution  to  the  one-dimensional  Brownian 
control  problem.  The  following  proposition  is  proved  in  Appendix  A  of  Rubio. 
Proposition  2.      The  optimal  solution  to  (25)-(26)  is  Y'^t)  =  suPq^^^^{X{s)  —  z^}'^  for 
some  base  stock  level  Zj. 

Hence,  the  optimal  solution  is  the  local  time  of  the  Brownian  motion  at  the  bar- 
rier 2j,  and  the  optimally  controlled  process  Z  is  a  reflected  Brownian  motion  (RBM)  on 
(  — oo,2j]  (see  §2.2  of  Harrison  1985  for  a  definition).  The  remainder  of  this  subsection  is 
devoted  to  the  derivation  of  Zj,  which  can  be  found  by  using  two  well  known  facts  regarding 
an  RBM  on  (— oo,z].  First,  for  Y  defined  in  Proposition  2  we  have  \\mt^oot~^Ex[Y{t)]  = 
df^j/V  for  fij  >  0,  which  is  independent  of  the  base  stock  level  z.  Hence,  the  trans- 
portation cost  does  not  affect  the  selection  of  z,  and  the  problem  simplifies  to  minimizing 
limsup-r^^  ^E  [Jq  g{Z{t))dt]  subject  to  (26). 

The  steady  state  density  for  Z  is  given  by  pz{x)  =  j/je"^'^"^'  \{  x  <  z  and  pz{x)  =  0 
if  X  >  2,  where  {/j  —  2p.'r/crj  >  0.  Therefore,  the  optimal  base  stock  level  can  be  found  by 
minimizing 

Ft{z)     =    n{-bx^a,)vTe^''^'=-'Ux+j\a2X^  +  a^x^a^)uTe^-'^''"^dx 

+  r  [hx  +  a5)C'Te^''^'~'^dx      for      z  >  /Sj,       and  (29) 

Jpr 
Ft{z)  =    n{-bx  +  ai)C'Te^^^'-'Ux  +  r  {a2X^  +  a:iX  +  a4)he^''^'~'^dx  (30) 

J—oo  Jaf 

for  Or  <  z  <  fir-  The  constants  in  (29)  and  (30)  have  the  same  definitions  as  in  §2.5.  Note 
that  while  the  optimal  base  stock  level  Zj  always  satisfies  zj  >  aj  (this  is  easily  seen  by  the 
fact  that  g{x)  is  linear  and  has  a  negative  slope  for  x  <  qj),  it  need  not  be  larger  than  0t- 
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The  following  proposition  is  derived  by  using  integration  by  parts  on  (29)  and  (30), 
and  then  taking  the  first  two  derivatives  of  Fxi^)  with  respect  to  z. 
Proposition  3.   The  value  that  minimizes  Ft{z)  is 

h    \  (  yri^T  —  Q^r) 


vj        \b  +  h J  ye^TWr-oiT)  —  I 
otherwise,  z^  is  the  solution  to 


+  aT      if     4>/5t;  (31) 


2a2^_0rizT-&T)  ^  2a2ZT  +  as-'^  =  0.  (32) 

Furthermore,  the  predicted  optimal  cost  is  Ft{zx)  =  hzj  +  05  if  Zj  >  ^j,  and  Fr{zj)  = 

^2{^t)^  +  ^3-^f  +  ^4  otherwise. 

One  can  show  (from  the  fact  that  Ft{z)  is  convex  and  continuously  difFerentiable) 
that  there  is  a  unique  optimum  base  stock  level;  that  is,  either  there  exists  a  solution  Zj  to 
(31)  that  satisfies  zf  >  /?x  or  a  solution  Zy  to  (32)  that  satisfies  zf  <  /?y,  but  not  both. 

2.7     The  Proposed  Policy 

In  this  subsection  we  map  the  solution  of  the  approximating  heavy  traffic  control  problem 
into  a  policy  for  the  original  IRP  with  TSP  routing.  The  control  concerns  two  decisions: 
whether  the  vehicle  should  be  busy  or  idle,  and  how  to  assign  the  load  among  the  retailers 
during  a  tour.  We  address  the  load  allocations  first. 

Since  the  system  evolves  dynamically  in  time,  the  decision  of  how  much  of  the  load  to 
leave  at  each  retailer  is  best  delayed  until  the  vehicle  arrives  at  the  site.  Let  to  correspond  to 
the  epoch  at  which  the  vehicle  leaves  the  warehouse  with  a  full  load,  and  consider  the  epoch 
t~  >  to,  which  is  the  point  in  time  just  before  the  vehicle  arrives  at  retailer  i.  At  time  t~ ,  the 
state  of  the  system  is  given  by  the  inventory  levels  at  the  retailers,  iQi{t~),. . .  ,Qm{l-7))i 
and  the  size  of  the  remaining  load,  L{t7).  The  mapping  from  heavy  traffic  solution  to 
proposed  policy  is  straightforward:  the  proposed  pohcy  attempts  to  track  the  heavy  traffic 
solution  (in  particular,  the  optimal  cycle  placement)  as  closely  as  possible.  The  key  issue 
to  be  addressed  is  that  the  heavy  traffic  solution  is  expressed  in  terms  of  normalized  space 
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and  time  and  in  terms  of  the  total  embedded  inventory  process,  whereas  the  proposed  policy 
must  be  expressed  in  terms  oi  {Qi{t~), ...  ,Qm{t~),L{t'~)). 

Recall  that  equation  (27)  relates  the  scaled  cycle  placement  vector  x,  and  the  normal- 
ized total  embedded  inventory  Z{t)  =  x.  Since  the  load  allocation  decision  is  taken  when 
each  retailer  is  reached,  we  first  establish  a  relationship  between  the  current  total  system 
inventory  Q(ti')  =  ^jQji^T)  '^'"^  l-''^'  corresponding  embedded  inventory  level.  Because 
we  need  to  reverse  the  scalings  in  the  solution  to  the  heavy  traffic  control  problem,  let  us 
define  the  unsealed  embedded  inventory  q  =  y/nx  and  the  unsealed  cycle  placement  vector 
9i  =  \/nxi.  In  keeping  with  the  behavior  predicted  by  the  heavy  traffic  averaging  principle, 
we  develop  this  relation  under  a  deterministic  evolution  for  the  retailer  inventories  over  the 
course  of  a  cycle.  If  the  vehicle  leaves  the  warehouse  at  time  <0)  then  it  arrives  at  retailer 
i  at  time  <~,  where  i,  =  <o  +  ^oi'^^-  Therefore,  the  retailer  inventories  relate  to  the  cycle 
placement  parameters  by  Qj{t~)  =  qj  +  ^j&Jf^  for  j  >  i  and  Qj{t~)  =  qj  +  V^  —  ^j^Ji^^  for 
j  <  i.  Summing  over  all  retailers,  we  get 

Qit-)  =  V.+E<l:^  (33) 

J 

where  T]i  =  12j<i{Vj  ~  ^j^Jf^)  +  J2j>i  ^j^If^  '^  ^ri  epoch  locator  constant  for  retailer  i. 
Making  the  substitutions  qi/y/n  =  x,,  q/y/n  =  x  and  Ojf^ I  \/n  =  Ojf^  into  (27)  yields  the 
unsealed  version  of  constraint  (27), 

E9.  =  9-L^.<—  (34) 

t  t 

Using  equations  (33)  and  (34),  we  can  express  the  total  inventory  at  time  ^o  (i-c,  when  the 
vehicle  was  at  the  warehouse)  as  a  translation  of  the  inventory  vector  at  time  t~: 


Q{to)  =  q  =  EQAt7)  -  V.  +  J2^Mr   f«^  ^  =  0,...,m.  (35) 


This  equation  maps  the  current  inventory  levels  Qj{t~)  into  the  one-dimensional  quantity  q 
that  is  required  to  interpret  the  heavy  traffic  results.  In  particular,  for  a  given  value  of  q  we 
can  find  q'  =  y/nx',  the  corresponding  optimal  (unsealed)  cycle  placeincnl  parameters  by 
reversing  the  normalizations  in  Proposition  1.  As  before,  this  is  done  by  letting  q,/\/n  =  a:,-, 
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ql y/n  =  x,  Vif  \pn  =  Vi,  and  Off^/  \/n  =  Off^  to  obtain  the  following  expressioiTs  for  q*  in 
terms  of  the  state  q: 

Region  1:  g  <  ar  =  E  -^>C  "  E  rVT:^'  ^^^^ 

«.-     =     -^Kfor.^,,  (37) 

€    =    9-E^.C  +  E^V;;  (38) 

Region  2:  oj  <  9  <  fr  =  E  ^.C  "  E  f^'^'  (39) 


Region  3:  /^r  <  q,  (42) 

0,-  +  /ij- 

,;  =  ,-EvS-  +  E^^'..  (44) 

which  are  independent  of  the  scaling  parameter  n. 

We  can  now  use  these  results  to  determine  the  delivery  size  at  retailer  i.  Under  the 
deterministic  inventory  evolution  for  the  optimal  cycle  placement,  Qi{tf)  (i.e.  the  inventory 
level  at  retailer  i  just  after  the  delivery  is  made)  satisfies 

Q,{tt)  =  q*  +  K-.  (45) 

If  we  deliver  (fj  units  to  retailer  i  then  the  actual  inventory  after  the  delivery  is  made  is 
Qi{t~)  +  di;  equating  this  quantity  to  (45)  yields  the  desired  delivery  size, 

d,=q^  +  V,-Q,{t-).  (46) 

Because  we  cannot  allocate  more  than  the  available  load  and  do  not  want  to  make  negative 
deliveries,  the  proposed  delivery  size  is  given  by 

0?*  =  max[c?„0] +  min[0,L(<,") -(/,]   for  z  =  1,2, . . .  ,m  -  1.  (47) 
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Finally,  to  guarantee  that  the  vehicle  returns  to  the  warehouse  empty,  we  set 

d'm  =  L{t-J-  (48) 

We  could  ill  principle  allow  negative  deliveries,  as  long  as  there  is  inventory  available 
at  retailer  i  and  the  total  amount  of  load  the  vehicle  carries  as  it  leaves  this  retailer  is  kept 
under  its  total  capacity.  However,  because  the  vehicle  returns  empty,  the  items  accrued  from 
negative  deliveries  would  most  likely  be  shifted  to  the  last  few  retailers  of  the  tour,  which  will 
not  necessarily  bring  the  state  of  the  system  closer  to  the  optimal  cycle;  hence,  we  disallow 
negative  deliveries. 

To  recapitulate,  the  proposed  dynamic  delivery  allocations  are  derived  by  the  following 
procedure:  (i)  observe  the  current  inventory  levels  {Q\{t~) . . .  ^Q,ri{iT))  and  compute  the 
unsealed  embedded  inventory  q  via  (35),  (ii)  use  (36)-(44)  to  derive  the  optimal  unsealed  cycle 
placement  parameters  q',  and  (iii)  observe  the  current  remaining  load  L{t~)  and  compute 
the  proposed  delivery  size  via  (46)-(48). 

We  now  turn  our  attention  to  the  busy/idle  policy,  which  has  decision  epochs  when 
the  vehicle  is  at  the  warehouse.  At  these  points  in  time,  the  vehicle  starts  a  new  tour  if  the 
total  inventory  level  IZjQjit)  is  below  the  unsealed  aggregate  base  stock  level  qj  =  \/n^f) 
otherwise,  it  idles.  Reversing  the  normalizations  in  Proposition  3  yields  the  optimal  unsealed 
base  stock  level  solely  in  terms  of  the  original  problem  parameters: 


J>+  Ii  j  \e''T(PT-o:T)  -  1 
otherwise,  q^  is  the  solution  to 


^    1 
qr  = hi 


+  aT      if      qT>0T;  (49) 


^e-.r(,r-r)  +  2a2qT  +  03  -  —  =  0.  (50) 

Furthermore,  the  predicted  optimal  cost  is  given  by  Friqj-)  =  h(jj  +  05  if  q^  >  (3t,  and  by 
^V('7f)  —  <^2(?f)^  +  ^^^r  +  '^'■1  otherwise.  The  constants  in  these  expressions  are  the  unsealed 
counterparts  of  the  ones  defined  earlier: 

2(1 -ptW 


Ut  = 


xerici  +  vc^y 
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«.  =  4E^-i:A,<'H^iE^      and 


Z^    t  oz       I   2  ^    '   '      2  ^    6,  +  /li 


We  have  completely  characterized  a  dynamic  control  policy  that  depends  exclusively 
on  the  original  system  parameters.  The  policy  specifies  the  two  controllable  aspects  of  the 
system:  the  aggregate  base  stock  level  defined  in  (49)-(50)  determines  the  vehicle  idhng 
pohcy,  and  the  delivery  sizes  d*  defined  in  (47)- (48)  characterize  the  allocation  of  units  to 
retailers. 

3       The  IRP  with  Direct  Shipping 

The  only  difference  between  the  direct  shipping  (DS)  case  and  the  TSP  case  is  the  routing 
scheme  used  when  the  vehicle  is  operating.  We  retain  all  notation  from  §2,  occasionally  using 
the  subscript  "D"  (for  direct  shipping)  in  place  of  the  subscript  "T"  (for  TSP).  Moreover, 
since  the  procedure  is  very  similar  in  both  cases,  we  omit  nearly  all  of  the  details  for  the  DS 
case,  describing  only  the  distinctive  aspects  of  the  analysis.  The  most  significant  difference 
between  DS  and  TSP  is  that  the  DS  case  does  not  have  a  cyclic  structure.  This  has  several 
consequences,  one  of  which  is  that  the  results  are  not  as  theoretically  solid  as  in  the  TSP 
case. 

3.1      Heavy  Traffic  Analysis 

In  the  DS  case,  the  vehicle  always  leaves  the  warehouse  with  a  full  load,  visits  a  single  retailer 
and  returns  empty,  so  that  every  time  a  retail  site  is  visited  its  inventory  level  increases  by 
V  units.  As  before,  it  is  convenient  to  express  the  dynamic  allocation  as  deviations  from  a 
nominal  policy.  The  nominal  policy  we  consider  is  not  achievable:  we  assume  that  under 
the  nominal  policy  an  amount  Vi  is  delivered  to  retailer  i  in  every  delivery.  We  let  Soit) 
denote  the  number  of  DS  deliveries  made  by  a  continuously  active  vehicle  during  [0,i].  We 
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then  let  cf^{t)  be  defined  by  the  analog  of  equation  (i  1  obtained  by  replacing  7"s  by  Z)'s. 
Then  c/j(0  =  IC,  ef  (0  is  the  total  amount  delivered  during  the  current  trip.  These  processes 
satisfy  (2)-(6).  Equation  (7)  still  holds  as  well,  once  7"s  are  replaced  by  D's.  The  problem 
formulation  is  thus  nearly  identical  to  (2)-(12). 

The  DS  case  lacks  the  natural  cyclic  structure  of  the  TSP.  Tour  times  in  the  TSP 
are  i.i.d.,  and  each  tour  results  in  the  delivery  of  V  units.  The  DS  case  would  have  a  cychc 
structure  if  the  sequence  of  retailers  visited  followed  a  cyclic  pattern  (such  as  a  polling  table), 
or  had  a  regenerative  structure  (such  as  a  Markov  chain).  Neither  of  these  can  be  used  in 
the  DS  case,  for  exactly  the  same  reason  that  fixed  delivery  sizes  could  not  be  used  in  the 
TSP  case:  inventory  costs  would  be  infinite  over  the  long  run.  A  dynamic  policy,  described 
in  §3.2,  is  used.  For  this  policy  the  fraction  of  total  shipments  that  go  to  retailer  i  does  not 
vary  over  times  of  order  n.  To  satisfy  average  demand  at  all  retailers,  this  fraction  must  be 
A,/A  for  retailer  i.  The  traffic  intensity  for  the  DS  system  is  thus  pp  =  2^^  KOoi/V. 

We  use  the  same  heavy  traffic  normalizations  as  in  §2.2.  The  heavy  traffic  conditions 
are  given  by  (20)-(24),  with  (21)  and  (23)  understood  to  hold,  respectively,  for  %  and  Cq,-, 
1  <  2  <  m,  and  (22)  replaced  by 

'-  =  ^(^-')>»-  ''" 

As  mentioned  above,  the  DS  case  does  not  have  the  cyclic  structure  of  the  TSP  case.  Thus  we 
cannot  use  functional  central  limit  theorems  based  on  renewal  processes  to  prove  convergence 
to  a  Brownian  motion.  In  the  DS  case  the  parameter  corresponding  to  Sj-  in  the  TSP  case, 
5£j,  is  given  by  sj)  =  A~^  J^i  ^i^oi^oi-  ^  his  expression  would  clearly  arise  if  the  next  retailer 
were  chosen  in  an  i.i.d.  manner.  It  can  be  shown  to  hold  for  any  policy  where  the  fraction  of 
times  that  a  retailer  is  visited  does  not  vary  over  times  of  order  n  using  the  Random  Time 
Change  Theorem. 

The  lack  of  a  cyclic  structure  makes  it  impossible  to  consider  an  embedded  normalized 
netput  process.  Indeed,  if  we  embed  at  epochs  during  which  the  vehicle  is  at  the  warehouse, 
we  will  not  obtain  a  meaningful  process.  VVc  must  thus  average  the  normalized  netput 
process  to  obtain  a  meaningful  limit.    By  arguments  similar  to  those  in  §2.3,  this  averaged 
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normalized  netput  process  is  well  approximated  by  a  Brownian  motion  X  with  drift  /i^  and 
variance 

The  averaged  total  inventory  process  is  defined  by 

m  =  X{t)  -  ^\      Y{t)  .  (52) 

Now  we  slow  down  time  by  a  factor  of  ^/n  and  turn  to  the  fluid  model.  Again  we 
face  the  problem  of  optimally  placing  the  limit  cycles  for  the  deterministic  evolution  of  the 
retailer  inventories  in  R"*.  In  contrast  to  the  TSP  case,  we  introduce  an  approximation  to 
facilitate  this  optimization.  We  motivate  this  approximation  by  use  of  an  example.  Suppose 
that  V  =  60,  \\  =  3,  A2  =  2  and  each  retailer  was  exactly  six  time  units  away  from  the 
warehouse.  Then  a  continually  busy  vehicle  using  the  polling  table  (12121)  could  visit,  on 
average,  retailer  1  every  20  time  units  and  retailer  2  every  30  time  units,  thereby  satisfying 
average  demand.  Although  optimally  placing  a  limit  cycle  for  a  small  polling  table  such  as 
this  one  is  manageable,  the  optimization  problem  gets  unwieldy  very  quickly  as  the  size  of 
the  table  grows.  Our  approximation  assumes  the  existence  of  an  idealized  policy  that  would 
make  a  delivery  to  retailer  i  every  Vj \i  time  units  in  the  fluid  model;  in  the  context  of  our 
example,  we  assume  that  retailer  1  (2)  receives  a  dehvery  exactly  every  20  (30)  time  units, 
even  though  the  (12121)  polling  table  cannot  achieve  such  perfect  regularity  in  the  fluid 
model. 

Hence,  our  approach  is  to  optimally  place  an  idealized  fluid  cycle  at  the  fast  time 
scale,  and  then  track  this  cycle  as  closely  as  possible  with  our  proposed  policy.  Because 
deliveries  are  perfectly  regular  in  the  idealized  cycle,  the  use  of  this  approximation  causes 
us  to  underestimate  the  inventory  cost  incurred  over  a  cycle;  however,  simulation  results 
in  §6.1  show  that  the  heavy  traffic  analysis  incorporating  this  approximation  appears  to  be 
very  accurate,  at  least  for  the  five-retailer  cases  considered  there.  Moreover,  the  use  of  an 
idealized  cycle  allows  us  to  avoid  the  task  of  determining  the  actual  behavior  of  the  individual 
inventory  levels  on  the  fluid  time  scale.  This  task  is  more  than  just  tedious:  due  to  the  lack 
of  a  cyclic  structure  it  appears  to  be  extremely  difficult. 
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Now  we  turn  to  the  optimal  placement  of  the  idealized  cycle.  We  still  define  the  cycle 
placement  by  the  vector  (xi,X2.,. . .  ,Xm),  where  x,  represents  the  lowest  point  during  the 
cycle  of  the  fluid  inventory  level  at  retailer  i.  Under  the  idealized  policy,  the  fluid  inventories 
are  similar  to  the  TSP  paths  pictured  in  Figure  1;  the  only  differences  are  the  delivery  size  (in 
this  case  we  deliver  a  full  load  of  V  units  on  each  visit  to  a  retailer)  and  the  visit  frequency, 
which  equals  V /X,  in  order  to  maintain  a  balanced  flow. 

The  next  step  is  to  establish  the  relationship  between  the  cycle  placement  variables 
T,  and  the  averaged  total  inventory  level  Z{t)  =  x.  The  constraint  related  to  consistency 
between  individual  and  total  inventory  levels  takes  the  form  that  the  averaged  total  inventory 
equals  the  sum  of  the  average  individual  inventory  levels.  The  average  fluid  inventory  at 
retailer  i  over  an  idealized  cycle  is  x,  +  V /2.  Hence,  when  the  averaged  total  inventory 
Z{t)  =  X,  the  cycle  placement  parameter  must  satisfy 

E-.=--'f-  (53) 

Because  the  fluid  delivery  size  equals  V  under  the  DS  case,  the  inventory  cost  function 
gi{xi)  is  given  as  in  (28),  except  that  V,  is  replaced  by  V.  Comparing  constraints  (27) 
and  (53),  it  is  clear  that  the  optimal  cycle  placement  is  precisely  the  solution  given  in  the 
TSP  case,  except  that  we  replace  K  by  V  and  J2i  ^i^of^  by  mV/2. 

The  computation  of  the  optimal  vehicle  idling  policy  is  identical  to  that  in  §2,  except 
that  the  parameter  of  the  exponential  stationary  density  of  the  RBM  is  i>d  =  'Ifio/f^o- 
Hence,  Proposition  3  characterizes  the  optimal  base  stock  level  for  the  DS  case,  with  i>D 
replacing  i>t,  and  with  the  substitutions  described  above  in  the  corresponding  definitions  of 
the  constants  a,  and  the  thresholds  a  and  $. 

3.2      The  Proposed  Policy 

The  mapping  from  heavy  traffic  solution  to  proposed  policy  uses  the  same  philosophy  as  in 
the  TSP  case.  We  begin  by  establishing  a  relationship  between  the  total  inventory  in  the 
system  at  the  current  decision  epoch  and  the  cycle  placement  parameters.  Although  there 
are  several  possible  ways  to  do  this,  we  keep  track  of  the  vector  process  (rj(0,. . .  ,rm(0)i 
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which  specifies  the  time  of  the  most  recent  visit  to  each  retailer.  Hence,  if  we  denote  the 
current  time  by  <,  then  t  —  ri{t)  represents  the  elapsed  time  since  the  vehicle  last  visited 
retailer  i.  The  inventory  at  retailer  i  at  time  t  relates  to  the  unsealed  cycle  placement 
parameter  qi  =  \/n  Xi  via 

Qi{t)  =  q,  +  V-  Ut  -  r,{t)).  (54) 

Since  stochastic  effects  can  lead  to  unusually  long  intervisit  periods,  it  is  possible  to  have 
V  —  Xi{t  —  ri{t))  <  0,  which  would  make  the  cycle  placement  of  the  retailer  higher  than 
the  current  inventory  level,  thereby  contradicting  the  definition  of  g,.  Because  one  would 
expect  that  Qt{t)  >  Qi  +  Aj^oi,  we  modify  equation  (54)  to  Qi{t)  =  qi  +  max[V  —  A,(i  — 
ri{t)),  XiOoi];  this  modification  leads  to  considerable  improvements  in  system  performance 
in  the  simulation  study.  Summing  over  all  retailers  we  obtain  YliQi  =  Q{i)  ~~  u{t),  where 
u{t)  =  J2i  T^SuX  [V  —  \i{t  —  r,(i)),  Aj^Oi]  ■  Combining  this  equation  with  the  unsealed  version 
of  (53)  yields 

1  =  T.Qdt)-u{t)  +  '^.  (55) 

which  relates  the  unsealed  averaged  inventory  q  (this  quantity  represents  the  unsealed  average 
total  inventory  in  the  DS  case)  to  the  current  inventory  level.  Reversing  the  heavy  traffic 
normalizations,  we  obtain  the  following  formulas  for  the  unsealed  optimal  cycle  placement 
vector  q*  =  y/nx*  given  q: 

r,     .  jrK-^  b  +  hi       mV 

Region  1:  g  <  Qp  =  -V }^  ,     ,    ,    +  "1^' 

i    0,  +  h,         2 

b  +  h,  . 

q,     =     -  V   tor   I  ^p, 

6,  +  tit 


2  ,^p  b,  +  h,' 

Region  2:  ^d  <  9  <  /?D  =  ^E  mT  +  ^T' 

i    k  +  h,         2 

.  2a7V    (        mV      ysr^h,-hk\ 


hi  +  bt  \  2  ^  bk  +  h 


fc      OA:  -t-  ^A;  , 


«7       =       W      E 


1      f^  1  \-^ 


Region  3:  (3d  <  q, 
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which  are  independent  of  the  scaling  factor  n. 

We  use  this  cycle  placement  as  an  "ideal"  m-dimensional  inventory  state  for  a  given  q 
and  u{t),  and  choose  the  next  retailer  so  as  to  bring  the  current  inventory  vector  as  close  as 
possible  to  this  ideal  state.  Let  ^o  be  the  time  epoch  at  which  the  vehicle  is  ready  to  depart 
from  the  warehouse,  and  consider  the  inventory  evolution  over  the  next  delivery  trip.  Under 
a  deterministic  inventory  evolution,  the  vehicle  will  reach  retailer  i  (if  it  chooses  to  go  there 
next)  at  time  i,  =  ^o  +  ^oi-  In  the  deterministic  tour  corresponding  to  the  optimal  cycle 
placement,  the  retailer  inventory  levels  right  after  a  delivery  is  made  to  retailer  i  is  given  by 
Qntf)  =  q:  +  V  and 

Q'^itf)  =  max  [q*  +  V-  X^ito  +  0o.  -  r,{to)),  q]  +  A,(^o,  +  ^o,)]      for  ;  ^  i, 

where  the  maximization  makes  the  adjustment  for  long  intervisit  times  as  discussed  above. 
In  contrast,  under  the  deterministic  evolution  the  actual  inventory  vector  after  a  delivery 
to  retailer  i  is  given  by  Qx[it)  =  Qi[i-o)  +  V  —  X^Oq,  and  Qj{tt)  =  Qj{to)  —  X-jOq,  for  j  ^  i. 
Therefore,  the  resulting  Euclidean  distance  between  the  ideal  and  actual  inventory  vectors 


after  a  delivery  to  retailer  i  is  A{i)  =  y  Hj  [Qjiif)  ~  Q'ji^t)]  ■  The  proposed  control  sends 
the  vehicle  to  retailer  k,  where  k  =  argmin,  A(?'). 

Finally,  as  in  the  TSP  case,  the  busy/idle  control  is  a  direct  unsealing  of  the  heavy 
traffic  results.  The  only  added  complexity  is  that  for  the  DS  case  the  vehicle  visits  the  ware- 
house after  every  delivery  and  so  has  many  possible  idling  decision  epochs.  By  equation  (55), 
our  proposed  policy  idles  a  vehicle  at  the  warehouse  whenever 

Q{t)  -  u{t)  +  —  >  w'o, 

where  Wq  =  y/n  z'^  is  the  unsealed  idling  threshold.    Making  the  suitable  scaling  substitu- 
tions, we  have  that 

f'     \  (  ^d{0d  -  cud) 


Wn  = In 


'D 


I'D 


b-V  h)    \e''D(PD-aD)  -   1 
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+  Qo    if    w])  >  i3d;  (56) 


otherwise,  w^  is  the  solution  to 


1         -.n(wn-an)      .      __  1  "^'^      ,      T.V-  ^^ 


-e 


-i'c(tu£)-a£i) 


+  -^-T:-^  +  ^Er^  =  o.  (57) 


vd  ud         '^  i    k  +  hi 

Furthermore,  the  predicted  optimal  cost  is  given  by 

r. .  * X    ,  .    ,^y   y ^,    v^(hi-h)^  ,    . 

FD{w},)  =  hwh-h^  +  -^h,--Y:^-^-^    if    w},>f3o, 

and  Fj:){w^)  =  aj{w*p)    +  a^Wp  +  ug  otherwise.  The  constants  in  these  equations  are 

'1-Pd)XV 


vd  = 


2Var  (y: 


h,  m  \ 

— —        and 


k  +  k      2 


/     ,^      h,  mV\        y,-^     kh. 


b,  +  h,         2   )     '    2^b,  +  h,' 
where  (ciDi/^D^Oy)  are  defined  in  the  unsealed  cycle  placement  formulas  above. 

4      Comparison  of  TSP  and  DS  Routing 

In  this  section  we  compare  the  relative  performance  of  the  two  fixed  routing  schemes  (TSP 
and  DS).  The  predicted  cost  functions  Ft,  Fq  derived  in  §2  and  §3  represent  only  the  inven- 
tory component  of  the  system  cost.  Denote  a  generic  fixed  routing  scheme  by  3J  G  {TSP,  DS}, 
and  by  C(3?)  the  total  cost  for  the  system  under  this  scheme.  The  total  system  cost  is  ob- 
tained by  adding  the  transportation  cost  (or  equivalently,  subtract  the  idleness  reward)  to 
the  inventory  cost;  that  is,  we  set  C(9?)  =  F^{w^)  —  r{l  —  p^n).  Notice  that  the  transportation 
cost  is  independent  of  the  stochastic  nature  of  the  system. 

A  crucial  observation  is  that  po  <  pr]  this  fact  (see  Rubio  for  a  proof)  is  a  simple 
consequence  of  the  triangle  inequality.  Although  trivial  to  prove,  this  inequality  has  several 
important  implications.  First,  the  DS  policy  achieves  lower  transportation  costs  than  the 
TSP  policy.  This  is  quite  interesting  since,  at  first  glance,  one  might  expect  the  converse 
to  hold.  However,  minimization  of  the  steady  state  transportation  cost  in  the  IRP  context 
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is  equivalent  to  maximizing  the  amount  of  items  delivered  per  unit  time  traveled;  hence, 
full  load  direct  shipping  provides  the  highest  transportation  efficiency  of  any  fixed  routing 
scheme.  More  importantly,  for  any  given  problem  instance,  the  demand  rate  can  be  increased 
until  pt  =  1  and  po  <  I;  that  is,  there  exist  some  demand  levels  where  the  DS  poHcy  would 
be  stable  while  the  TSP  policy  would  not.  We  should  note  that  p^  <  I  is  a.  necessary 
condition  for  stability  of  any  fixed  routing  scheme  3J  but  it  is  not  sufficient.  In  particular, 
having  p^  <  1  will  keep  the  total  inventory  stable  but,  in  the  absence  of  adequate  dynamic 
load  allocation,  it  is  possible  to  accumulate  inventory  at  one  retail  site  while  backorders  grow 
without  bound  at  another.  Hence  DS  will  dominate  TSP  routing  as  /Sj  — >  1  as  long  as  some 
form  of  stable  dynamic  allocation  is  used  in  the  DS  case. 

The  remainder  of  this  section  investigates  the  relative  performance  of  the  DS  and 
TSP  schemes  as  a  function  of  the  cost  parameters  r  and  b.  However,  readers  should  keep 
in  mind  that  while  the  qualitative  statements  below  are  true,  these  results  are  not  exact, 
because  our  calculation  of  F£){wq)  is  approximate  and  represents  a  slight  underestimate  of 
the  true  heavy  traffic  cost  under  the  DS  policy.  The  inequality  pp  <  pj  implies  that  DS  is 
preferred  to  the  TSP  policy  if  the  transportation  cost  is  high  enough.  In  particular,  the  DS 
policy  achieves  a  lower  overall  cost  for  any  r  >  {Ft{wj)  —  Fd{w}j))/{pt  —  po)-  While  the 
value  of  this  threshold  cost  may  be  found  numerically  for  any  particular  problem  instance, 
a  more  precise  characterization  requires  a  better  understanding  of  the  relationship  between 
the  inventory  costs  in  both  systems.  Unfortunately,  it  is  hard  to  make  simple  inventory  cost 
performance  comparisons  for  the  different  routing  schemes,  primarily  because  the  base  stock 
levels,  and  hence  the  predicted  inventory  costs,  are  not  in  closed  form  (see  equations  (50) 
and  (57)). 

To  study  the  relative  inventory  cost  performance  of  the  TSP  and  DS  schemes,  let  us 
consider  the  case  where  the  inventory  costs  at  the  retailers  are  symmetric  (i.e.,  /),  =  h  and 
hi  =  b  for  all  i)  and  b  becomes  large.  Because  the  value  of  ii^j  and  xr'p  in  (49)  and  (56)  is 
incretising  in  b/h,  one  expects  that  there  exist  some  critical  values  br-ibo  such  that  if  6  is 
increased  above  them  (while  leaving  h  fixed)  the  optimal  base  stock  is  given  in  closed  form. 
These  critical  values  indeed  exist  and,  for  the  case  of  symmetric  costs,  have  the  following 
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closed  form  expressions: 
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For  b  >  ma.x{bx,bD},  the  inventory  cost  difference  Ft(wj)  —  Fd{wq)  can  be  expressed  as 
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(58) 


As  6  ^  GO  the  value  of  (58)  is  dominated  by  the  term  [uq-^  —  i^£,^)ln(l  +  b/h)^  whose  sign 
will  be  the  same  as  the  sign  of  uq  —  vx-  Define  the  critical  value 
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where  exp[.T]  =  e^.  Then  for  b  >  max{67',6£),6c},  the  DS  pohcy  achieves  the  lower  inventory 
cost  if  and  only  if  ud  —  i't  >  0,  where  v^  is  the  exponential  parameter  for  the  steady  state 
distribution  of  the  RBM  associated  with  routing  scheme  3J.  Because  pn  <  pT,  the  condition 
(Tq  >  aj  is  required  for  the  TSP  policy  to  be  preferred.  For  the  case  of  deterministic  travel 
times  it  follows  that  a\)  =  aj,  and  so  DS  dominates  in  the  high  backorder  case.  Moreover,  if 
both  fiD  and  ht  are  finite  then  the  difference  in  mean  distance  travelled  must  be  0{n~^'^)] 
see  equation  (62).  Hence,  in  the  heavy  traffic  limit,  we  actually  have  cr^  =  Uj. 

This  result  is  somewhat  counterintuitive:  since  the  TSP  policy  makes  smaller  and 
more  frequent  deliveries  to  each  retailer,  it  might  be  expected  to  outperform  the  DS  scheme 
in  terms  of  inventory  cost.  However,  for  large  backorder  penalties,  the  TSP  policy  sacrifices 
efficiency  over  the  long  run  via  its  smaller  drift,  and  causes  the  total  inventory  to  spend  too 
much  time  in  the  expensive  backorder  regions. 


5      The  IRP  with  Dynamic  Routing 

5.1      Formulation  of  the  Limiting  Control  Problem 

Consider  now  a  situation  where,  once  the  vehicle  is  loaded  at  the  warehouse,  it  can  embark 
on  either  a  full  load  TSP  tour  or  a  direct  shipment  to  some  retailer.  All  other  aspects  of  the 
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problem  (e.g.  sources  of  uncertainty,  cost  structure)  remain  the  same  as  in  the  fixed  routing 
Ccises.    Because  the  formulation  of  the  dynamic  problem  is  a  natural  extension  of  the  two 
fixed  routing  problems  described  earlier,  we  only  sketch  the  argument  and  refer  readers  to 
Rubio  for  a  detailed  treatment;  furthermore,  much  of  the  earlier  notation  will  be  reused. 
U  Qi{0)  =  0  then  the  system  state  equations  are  given  by 

Qi{t)  =  V,ST{BT{t))  +  V,SD{BD{t))  -  D,{i)  +  ef  (0  +  ej{t)  for    t  >  0,  (59) 

and  the  cumulative  idle  lime  process  is  I{t)  =  t  —  Brit)  —  Boit)  for  t  >  0.  Notice  that, 
according  to  the  previous  definitions  of  the  delivery  allocation  controls,  cj{t)  +  cf  (f)  repre- 
sents the  cumulative  deviation  from  the  nominal  allocation  over  past  TSP  cycles/DS  trips 
plus  the  amount  delivered  over  the  current  cycle/trip  at  retailer  i. 

The  first  step  in  the  development  of  a  heavy  traffic  approximation  is  to  characterize 
the  influence  of  the  routing  control  on  the  total  netput.  Let  6{t)  =  Bu{i)l{Bj[t)  +  Boit)) 
denote  the  cumulative  fraction  of  busy  time  that  the  DS  service  has  been  used.  Then  the 
netput  for  the  dynamic  routing  IRP  system  is 

^^^^  =  (^^^  ~  '^^^^^  +  2T^\Af^^^  -  aJ  i  +  V  [Sr{BT{t))  +  Su[Bo{i))]  -  V{t).  (60) 
Notice  that  this  equation  reduces  to  (13)  when  TSP  routing  is  always  used.  Summing  the 
equations  in  (59)  over  the  retailers  and  substituting  the  relevant  definitions  into  (60),  we 
obtain  the  following  expression  for  the  total  inventory  in  terms  of  the  netput  and  controls: 

Q{t)  =  X{i)  -  (^(1  -  <^(0)  +  2^^I.go/^^))  ^(^^  +  '^^)'  (^^^ 

where  t[t)  =  er(i)  +  (.o{t)- 

The  heavy  traffic  conditions  are  a  union  of  the  conditions  for  the  two  fixed  routing 
problems.  Conditions  (22)  and  (51)  require  the  traffic  intensity  under  both  the  TSP  and 
DS  policies  to  approach  one  in  the  limit;  however,  now  we  do  not  require  {.ij  to  be  positive, 
because  the  condition  /^d  >  0  ensures  that  a  stable  control  is  i)ossible.  Hence,  in  terms  of 
the  problem  data,  the  retailers  are  required  to  be  located  fairly  close  together  relative  to 
their  distance  to  the  warehouse.  More  precisely,  the  average  travel  times  must  satisfy 

^7-2^0.  ^^/1\     (.^^^.^^1,   ,^i^...^,„.  (62) 
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One  consequence  of  equation  (62)  is  that  the  quantities  V/Ox  and  AV'/(?^j  Aj^oj) 
appearing  in  (61)  only  differ  by  0(n~^/^);  hence,  in  the  heavy  traffic  hmit,  these  two  terms 
are  equal  and  the  coefficient  in  front  of  the  idleness  term  in  (61)  is  a  constant  and  independent 
of  the  control  pohcy  employed.  Nonetheless,  we  will  maintain  equation  (61)  as  is,  and  view 
the  time-dependent  coefficient  as  a  refinement  of  the  heavy  traffic  Hmit. 

Let  3?(i)  G  {TSP,DS}  denote  the  routing  mode  that  is  used  at  time  t  in  the  limiting 
control  problem.  We  want  to  consider  intervals  of  time  during  which  one  of  the  two  controls  is 
used  exclusively,  and  approximate  the  resulting  normalized  netput  process  by  a  diffusion  with 
control-dependent  drift  and  variance.  In  order  to  do  this  we  need  to  reconcile  the  different 
definitions  of  the  netput  process  that  are  approximated  in  the  two  cases:  the  embedded 
netput  process  in  the  TSP  case,  and  the  averaged  netput  process  in  the  DS  case.  The 
lack  of  a  cyclic  structure  in  the  DS  case  forced  us  to  use  averaging  there,  while  the  choice 
of  embedding  in  the  TSP  case  was  made  for  convenience.  Thus  we  translate  the  embedded 
netput  process  from  the  TSP  case,  x{t)->  to  an  averaged  netput  process  x[t).  This  translation 
consists  of  adding  a  constant,  ?7,  which  needs  to  be  determined:  \(<)  =  x{t)  -f  77  .  The  same 
constant  will  be  used  to  translate  the  normalized  total  inventory  process  for  the  TSP  case  as 
well,  and  can  be  calculated  in  that  context  using  the  relationship  between  total  embedded 
inventory  x  and  cycle  placement  variables  Xj  developed  in  §2.5.  Let  x  denote  the  average 
inventory  level  we  are  seeking.  Then,  since  the  average  of  the  sum  of  the  inventory  levels  is 
equal  to  the  sum  of  the  averages,  we  have  x  =  ^i{xi  +  VJ2)  =  Ylt  -^i  +  V  fi-  Combining  this 
with  (27)  yields  x  =  x  -  E.  A,^^^"^  +  V/2,  so  that 

V  =  ^-E^^^or-  (63) 

"  i 

The  role  of  this  translation  constant  in  the  implementation  of  the  proposed  policy  is  described 
in  the  next  subsection. 

By  the  arguments  used  in  the  TSP  and  DS  cases,  we  can  deduce  that  the  normalized 
averaged  total  netput  process  x  is  well  approximated  by  a  diffusion  process  X{t,^{t))  with 
control-dependent  drift  and  variance  given  by 

^(^(^^)  =  [,r    If     m  =  TSP         ^"^     "^^(^^^  =  14    if     m  =  TSP    ' 
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respectively.  In  other  words,  the  routing  control  switches  the  netput  of  the  system  from 
one  Brownian  motion  to  another.  Furthermore,  these  two  Brownian  motions  have  the  same 
parameters  as  the  diffusion  limits  of  the  corresponding  fixed  routing  ccises.  By  equation  (62), 
(Tq  and  o-j.  only  differ  by  0{n~^^'^);  once  again,  we  retain  this  refinement  of  the  heavy  traffic 
limit. 

Because  the  normalized  idleness  process  Y{t)  will  only  be  exerted  on  a  set  of  measure 
zero,  we  are  free  to  choose  ^{t)  for  all  times  t,  not  just  the  nonidling  times.  If  we  let 
e{t)  =  e{nt)/y/n,  let  6{t)  =  6{nt)  be  the  fraction  of  busy  time  devoted  to  DS  in  the  heavy 
traffic  system,  and  define  the  averaged  total  inventory  level  for  the  system  as 

then  we  obtain  the  same  time  scale  decomposition  as  before:  under  the  fluid  scaling,  Z{t)  is 
fixed  and  the  individual  fluid  levels  {Wi{t), . . . ,  Wm{t))  move  deterministically  at  a  finite  rate. 
The  exact  evolution  of  {Wi{t), . . . ,  Wm{t))  is  determined  by  the  averaged  inventory  level,  the 
cycle  placement  and  the  routing  scheme  in  use  at  time  t.  We  may  therefore  decompose  the 
dynamic  routing  IRP  into:  (i)  given  Z{t)  =  x  and  routing  mode  ^{t)  G  {TSP,DS},  use 
the  results  in  §2  and  §3  to  determine  the  optimal  cycle  placement  and  the  corresponding 
inventory  cost  function  ^3e(j)(a;);  (ii)  choose  the  nonanticipating  control  (y'(f),3?(i))  (where 
Y  is  nondecreasing  and  right  continuous)  to  minimize 


limsup  —E 

T— CO      i 


rg^(,){Z{t,^{t)))dt-rY{T) 

•/O 


(65) 


subject  to  (64). 


5.2      Optimization  of  a  Triple  Threshold  Policy 

The  diffusion  control  problem  (64)-(65)  appears  to  be  difficult  to  tackle  for  two  reasons:  the 
coefficient  in  front  of  the  control  Y(t)  in  (64)  depends  upon  the  routing  control  ^{t),  and 
the  control-dependent  cost  function  .g»(x)  is  very  complex.  An  algorithm  is  used  in  the  next 
section  to  niimrrically  compute  the  solution  to  (64)-(65).  Here,  we  specialize  our  analysis  to 
the  following  triple  threshold  policy  that  is  characterized  by  the  parameters  ^i  <  ^2  ^  ^3^  the 
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vehicle  is  busy  whenever  the  total  averaged  inventory  Z{t)  <  23,  and  idles  when  Z{t)  >  Zs; 
while  busy,  the  vehicle  uses  the  TSP  routing  scheme  whenever  Z{t)  €  [^1,22)?  and  uses  the 
DS  mode  whenever  Z{t)  <  Zi  or  Z{t)  G  [22,23).  Our  goal  in  this  subsection  is  to  find  the 
optimal  values  for  the  parameters  {zi,Z2.,Zs). 

Motivated  by  our  analysis  of  the  stochastic  ELSP  with  setup  times  in  Markowitz, 
Reiman  and  Wein,  we  conjecture  that  the  most  general  form  of  the  optimal  policy  to  the 
diffusion  control  problem  (64)-(65)  is  of  the  triple  threshold  form  described  above.  In  terms  of 
the  ELSP,  the  DS  policy  corresponds  to  using  large  lot  sizes  and  the  TSP  scheme  corresponds 
to  using  small  lot  sizes:  large  lot  sizes  and  the  DS  policy  both  use  the  server  (which  is  the 
vehicle  here)  in  a  more  efficient  fashion.  The  most  general  form  of  the  ELSP  solution  derived 
in  Markowitz,  Reiman  and  Wein  corresponds  to  the  triple  threshold  policy  (see  Figure  5  of 
that  paper),  and  in  some  cases  (see  Figure  6  of  that  paper)  the  solution  could  be  described 
with  fewer  thresholds.  We  conjecture  that  the  IRP  solution  is  either  a  triple  threshold  policy, 
a  double  threshold  policy  with  22  =  -^3,  or  a  single  threshold  policy,  which  could  be  either 
the  TSP  poficy  {zi  =  —00,  22  =  23)  or  the  DS  policy  (21  =  22  =  23  or  21  =  22  =  —00).  In  §7 
we  describe  the  rationale  behind  our  conjecture. 

Our  analysis  of  the  triple  threshold  policy  requires  knowledge  of  the  stationary  distri- 
bution of  the  controlled  reflected  diffusion  process  and  the  long  run  expected  average  idleness 
rate.  The  stationary  distribution  7r(x)  must  satisfy  the  following  system  of  differential  equa- 
tions (see  Karlin  and  Taylor  1981): 

^^^(a:)-/x(x)— 7r(.T)  =  0       x<23,  (66) 

a'^{x)  d 

-j-T^[x)  -  ii{x)w{x)  =  0        x  =  23,  (67) 

z     ax 

where  the  dilFusion  parameters  are  given  by 

K-)  =  l'^     if-e[2„22)  ^^^        a^x)  =  \i     if-^[-i'-^) 

[  fiD    it  X  <  zi    OT   x  e  [22,23)  [  afj    \t  x  <  zi    ov   X  e  [22,23) 
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The  only  continuous  density  that  satisfies  (66)  and  (67)  is 

7r(a;)  =  i    k^he^''^^-''''     if  x  G  (21,22]    , 
.  k-^C'DC^'^^^-''^    if  a;  G  (22,23] 


(68) 


w 


here 


k,= 


Ux 


^ 

^       i>x(l  +  e^T{'2-zi)gUD(z3-:2)  -  e'^r{z2-2i))  4-  j)£,(e'^r(z2-zi)  _  1) 

j>^e''7-(22-Zl)gi>D(23-22) 


and 


A;.-,= 


i>T{l  +  e''^(^2-2i)e^D(23-22)  —  e''T(22-2i))  _(-  i>p(e''r(22-2i)  _  1)' 
this  is  the  stationary  density  for  the  total  inventory  process. 

Now  we  turn  to  the  expected  idleness  rate.  Taking  expectations  on  both  sides  of  (64), 
rearranging  terms,  dividing  through  by  t  and  taking  the  limit  as  <  — >  00  we  have  that 


lim  -E 

(-.00  t 


!^^-^<+^(^^TO 


Y{t) 


lim  -E[X{t,^{t))]  -  lim  -E[Z{t,^{t))]. 


(69) 


The  first  term  on  the  RIIS  of  (69)  corresponds  to  the  asymptotic  growth  rate  of  the  netput 
process  under  the  proposed  policy.  If  we  let  S  =  //m(_oo6(i)  denote  the  long  run  fraction  of 
time  that  the  vehicle  uses  the  DS  routing  policy,  then  the  long  run  growth  rate  is 


lim  -E[X{t,^{t))]  =  (1  -  6)fiT  +  SuD. 

i—'OO    t 


(70) 


By  the  definition  of  the  triple  threshold  policy,  it  follows  that 

6  =  r  ■K[x)dx  +  /''  ■K{x)dx  =  k^  +  ^-,(1  -  e''^^''^-''')). 

J  — CO  J  Z2 

The  left  side  of  (69)  can  be  expressed  as 


)'-<''^S£)^'^^ 


(71) 
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Finally,  according  to  the  steady  state  distribution  for  Z{t)  in  (68),  z  =  limt_oo  E[Z{t)]  exists 
and  is  finite.  Hence,  the  second  term  on  the  right  side  of  (69)  vanishes  in  the  limit.  Canceling 
this  term  and  substituting  (70)  and  (71)  into  (69)  gives 

y  =  ;un-^Elnt)]  =  V^[i-       ,i_„2j;_,,,„,^,,,^      )■  (72) 

We  may  now  write  an  expression  for  the  steady  state  cost  of  the  IRP  under  the  triple 
threshold  dynamic  routing  policy  as  a  function  of  the  control  parameters.  The  problem  is 
hence  reduced  to  finding  {zl,Z2,zl)  that  minimize 

/■'I  /•'2  /"ZS 

F{zi,Z2,Z3)=  gD{x)Tr{x)dx  +         gT{x)-K{x)dx  +  I     gD{x)Tr{x)dx  -  ry.  (73) 

J  —  (X>  J  Zl  J  Z1 

Unfortunately,  F(zi,  22,23)  is  rather  complicated  and  a  closed  form  solution  for  the 
optimal  control  parameters  does  not  seem  possible.  Furthermore,  an  explicit  expression  for 
F(2i,  22,23)  is  not  available  in  general  because  its  exact  form  depends  on  the  relationship 
between  [aT^f^r)  and  (cidiI^d),  which  are  the  parameters  that  define  the  three  characteristic 
regions  for  the  cycle  placement  and  inventory  cost  solutions.  By  considering  different  problem 
parameters,  one  may  get  either  13d  >  ^t  or  f^o  <  Pt- 

5.3      The  Proposed  Policy 

The  first  step  in  mapping  the  solution  of  the  diffusion  control  problem  into  a  policy  for  the 
dynamic  IRP  is  to  obtain  unsealed  threshold  levels.  If  we  define  the  unsealed  thresholds  as 
Wi  =  y/n  Zi  and  make  the  parameter  scaling  substitutions  as  in  the  fixed  routing  cases,  then 
the  scaling  factor  n  cancels  out  of  the  expression  for 

F{wi,W2,uJ3)  =  \/nF(^=., --=,—=),  (74) 

\/n    Wn    Jn 


and  what  remains  is  a  function  only  of  the  original  system  parameters.  In  §6.2,  the  thresholds 
minimizing  (74)  are  determined  numerically,  and  we  refer  to  these  cost-minimizing  thresholds 
as  {wl,wl,wX). 

We  now  describe  how  these  thresholds  dictate  whether  the  vehicle  employs  the  TSP 
mode,  the  DS  mode  or  idles;  this  decision  is  made  at  the  epochs  when  the  vehicle  is  at 

37 


the  warehouse.  Because  of  the  translation  of  the  embedded  total  inventory  process  into  the 
averaged  total  inventory  process  in  (63),  we  must  keep  track  of  the  most  recently  employed 
shipping  option.  If  DS  was  most  recently  used  then  we  calculate  the  unsealed  averaged 
inventory  q  in  (55)  and  compare  it  to  the  unsealed  threshold  levels  {wl,W2,w'^)  to  determine 
the  TSP/DS/idle  policy.  If  TSP  was  most  recently  used  then  we  define  q  by  combining  (35) 
and  (63): 

9  =  E^.(^r)-'/.  +  EvJ'''  +  '7,  (75) 


w 


here 


rj  =  ^f,=  ^-~'£X,el'''  .  (76) 

is  the  unsealed  translation  constant.  In  this  case  the  TSP/DS/idling  decision  is  determined 
by  comparing  q  in  (75)  to  the  unsealed  threshold  levels  {w'[,W2.,w^).  Finally,  the  detailed 
allocation  decisions  (which  retailer  to  visit  next  in  the  DS  case  and  how  many  units  to  deliver 
to  each  retailer  in  the  TSP  case)  are  determined  exactly  as  in  Sections  2  and  3,  except  that 
we  now  use  the  value  of  q  in  (75)  in  the  TSP  case. 

6      Computational  Results 

This  section  contains  a  series  of  computational  experiments  aimed  at  assessing  the  accuracy 
of  the  heavy  traffic  analysis  and  determining  what  aspects  of  the  control  policy  are  most 
important  for  good  system  performance.  The  computational  study  is  described  in  two  parts: 
the  fixed  routing  IRP  and  the  dynamic  routing  IRP. 

6.1      Fixed  Routing  IRP 

The  Monte  Carlo  simulation  experiments  performed  in  this  subsection  consider  systems  that 
have  five  retailers  and  Poisson  demand  processes.  We  also  set  the  transportation  cost  rate 
r  equal  to  zero  and  concentrate  on  the  inventory  cost.  The  total  arrival  rate  A  is  varied  to 
obtain  different  utilization  rates;  however,  the  fraction  of  demand  represented  by  retailer  i 
is  fixed  so  that  Aj  =  A/5,  A2  =  A/10,  A3  =  A/10,A4  =  A/5  and  A5  =  2A/5.  The  travel  time 
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random  variables  T,j  are  iid  second  order  Erlang.  The  mean  travel  times  are  adjusted  so  that 
109t  =  V  always  holds;  this  allows  us  to  consider  several  vehicle  sizes  while  maintaining  the 
traffic  intensity  at  pT  =  O.IA. 

We  perform  four  simulation  experiments  aimed  at  various  aspects  of  system  perfor- 
mance. 

Experiment  1:  The  first  set  of  simulation  runs  quantifies  the  cost  improvement 
obtained  under  the  TSP  policy  by  recalculating  the  cycle  placement  at  each  retailer,  as 
opposed  to  determining  these  values  only  once  per  cycle  (i.e.,  when  the  vehicle  is  at  the 
warehouse).  We  let  bi  =  b  =  5,  hi  =  h  =  l  for  i  =  1, ...  ,5  and  consider  the  mean  travel 
times  ^01  =  ^so  =  ^r/lO,  ^12  =  ^23  =  ^34  =  ^45  =  ^t/5-  These  travel  times  are  consistent 
with  a  pentagon  structure,  where  the  five  retailers  are  placed  at  the  vertices  of  a  pentagon, 
and  the  warehouse  is  located  midway  between  stations  1  and  5. 

We  consider  nine  different  scenarios,  which  are  generated  by  the  different  combinations 
of  three  vehicle  sizes  (100,10,5)  and  three  traffic  intensities  (0.5,0.7,0.9);  notice  that  some 
of  these  scenarios  grossly  violate  the  heavy  traffic  conditions.  For  all  cases  with  px  <  0.7, 
we  simulated  three  replications  of  36,000  time  units  (starting  with  an  empty  system  and 
discarding  the  first  2000  time  units)  with  cycle  placement  recalculation  at  the  retailers  and 
three  more  with  calculation  only  at  the  warehouse;  for  the  px  =  0.9  instances,  the  length  of 
each  replication  was  increased  to  240,000  time  units  (discarding  the  first  20,000  time  units). 
This  simulation  design  was  used  throughout  our  study  and  allowed  us  to  keep  the  standard 
deviation  of  the  cost  estirhate  under  1%  of  its  mean. 

Table  1  summarizes  the  results  of  the  experiment.  The  entries  in  the  table  represent 
the  increase  in  the  average  inventory  cost  when  delivery  sizes  are  calculated  only  at  the 
warehouse,  and  not  adjusted  over  the  course  of  the  tour.  As  predicted  by  heavy  traffic  theory, 
the  advantage  obtained  by  recalculation  at  the  retailers  becomes  quite  small  when  the  traffic 
intensity  is  high  (about  1%  when  pr  =  0.9).  However,  the  recalculation  advantage  increases 
with  small  vehicle  sizes  at  lower  traffic  intensities.  These  observations  complement  those  in 
Kumar,  Schwarz  and  Ward,  who  focus  on  this  issue  (calculating  delivery  allocations  once 
per  cycle  or  at  each  retailer)  using  a  much  different  model.  All  subsequent  TSP  simulations 
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PT  =  0.5 

PT  =  0.7 

PT  =  0.9 

V  =  100 

4.0% 

5.0% 

1.1% 

V  =\0 

7.4% 

18.6% 

0.7% 

V  =  5 

7.3% 

18.1% 

1.1% 

Table  1:  Cost  increase  when  placement  calculation  is  only  at  warehouse. 

employ  the  cycle  placement  recalculation  at  the  retailers. 

Experiment  2.  Thesecond  simulation  experiment  assesses  the  accuracy  of  the  heavy 
traffic  analysis  by  comparing  the  cost  incurred  under  the  derived  base  stock  levels  with 
the  cost  incurred  under  the  best  possible  base  stock  level.  We  maintain  the  same  set-up 
as  in  the  first  experiment,  except  that  asymmetric  cost  cases  are  also  considered,  where  the 
holding  rates  are  (1, 1, 2, 2, 2)  and  the  backorder  rates  are  (5, 10, 5, 10, 5)  for  the  five  retailers, 
respectively.  For  each  of  these  18  cases  (three  vehicle  sizes,  three  traffic  intensities  and  two 
cost  structures),  we  performed  an  exhaustive  search  in  a  series  of  simulations  (each  consisting 
of  three  replications  with  the  length  described  in  Experiment  1)  to  find  the  base  stock  level 
that  provides  the  lowest  system  cost. 

Table  2  summarizes  the  results;  each  entry  represents  the  suboptimality  (within  the 
class  of  base  stock  policies)  of  the  cost  incurred  by  using  the  derived  base  stock  level  instead 
of  the  optimal  base  stock  level  found  by  exhaustive  search.  The  base  stock  levels  derived 
from  the  heavy  traffic  analysis  are  very  accurate  for  moderate  and  high  traffic  intensities, 
and  only  degrade  in  the  pt  =  0.5,  small  vehicle  size  scenarios,  which  are  not  apt  to  occur  in 
practice. 

Experiment  3.  Now  we  study  the  performance  of  the  direct  shipping  policy,  and 
compare  it  to  the  performance  of  the  TSP  policy  on  the  same  system.  As  before,  this  is 
done  by  comparing  the  average  cost  obtained  under  the  derived  base  stock  levels  with  that 
under  the  optimal  base  stock  level  found  by  exhaustive  search.  Because  the  DS  policy  has  a 
huge  drift  advantage  over  the  TSP  policy  in  the  pentagon  topology  used  for  experiments  1 
and  2,  this  experiment  uses  the  travel  times  0oi  =  0.45^7  for  i  =  1,.. .  ,5  and  O12  =  O23  = 
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PT  =  0.5 

PT  =  0.7 

PT  =  0.9 

V  =  100 

Symm. 

0.0% 

0.9% 

2.6% 

Asym. 

0.6% 

2.2% 

0.0% 

V  =  10 

Symm. 

19.1% 

4.3% 

1.7% 

Asym. 

6.7% 

2.2% 

1.8% 

V  =  b 

Symm. 

14.6% 

1.1% 

1.5% 

Asym. 

11.6% 

1.1% 

0.6% 

Table  2:  Cost  suboptimality  of  derived  base  stocks  for  pentagon  TSP. 

^34  =  ^45  =  ^r/40,  so  that  pT  =  0.1  A  and  po  =  0.09A.  This  case  will  be  referred  to  as  the 
wedge  topology,  since  these  travel  times  are  consistent  with  such  a  shape.  In  practice,  TSP 
tours  are  often  generated  by  placing  the  warehouse  at  the  center  of  the  "pie" ,  dividing  the 
pie  into  wedges  and  solving  a  TSP  on  each  wedge;  see  Figure  1  of  Bell  et  al.  and  Figure  3 
of  Federgruen  and  Simchi-Levi.  The  other  problem  parameters  remain  as  in  the  symmetric 
cost  scenarios  in  Experiment  2,  except  for  the  fact  that  we  also  simulate  the  DS  policy  for 
the  case  when  A  =  10.  The  TSP  pohcy  is  not  simulated  for  this  case  because  it  corresponds 
to  pT  =  1,  and  the  system  is  not  stable  under  this  scheme.  Hence,  we  consider  12  cases  (four 
traffic  intensities  and  three  vehicle  sizes)  for  the  DS  policy  and  nine  cases  for  the  TSP. 

Tables  3  and  4  summarize  the  results  of  this  experiment.  The  entries  in  Table  3 
compare  the  performance  of  the  proposed  base  stock  level  to  the  cost  obtained  under  the  best 
base  stock  level  for  the  same  policy.  The  results  for  the  TSP  policy  are  roughly  comparable 
to  the  corresponding  results  for  the  pentagon  topology  in  Table  2,  although  the  base  stock 
levels  in  Table  3  do  not  seem  to  be  as  accurate  for  the  pT  =  0.9  cases.  The  derived  base 
stock  levels  for  the  DS  case  are  very  accurate,  even  when  the  heavy  traffic  conditions  are 
severely  violated. 

Table  4  presents  a  comparison  of  the  average  inventory  cost  for  the  DS  and  TSP 

policies.  The  percentage  difference  between  the  TSP  cost  and  the  DS  cost  is  given  by 

TSP  cost  —  DS  cost       .opio/ 
DS  cost 

The  entries  labeled  'Sim.'   represent  the  percentage  difference  in  inventory  cost  when  base 
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PT  =  0.50 
Pd  =  0.45 

PT  =  0.70 
PD  =  0.63 

PT  =  0.90 
Pd  =  0.81 

PT  =  1.00 
Pd  =  0.90 

V  =  100 

TSP 

2.41% 

5.31% 

6.10% 

N.A. 

DS 

4.13% 

2.42% 

1.74% 

0.52% 

V  =  10 

TSP 

11.04% 

0.00% 

3.67% 

N.A. 

DS 

4.30% 

3.01% 

2.43% 

0.08% 

V  =  5 

TSP 

17.48% 

0.00% 

1.19% 

N.A. 

DS 

3.70% 

1.75% 

1.40% 

0.00% 

Table  3:  Cost  suboptimality  of  derived  base  stocks  for  wedge  topology. 

stock  levels  are  found  by  exhaustive  search.  Notice  tliat  for  low  utilization  levels  and  large 
vehicle  sizes  the  TSP  policy  enjoys  a  considerable  advantage  over  the  DS  scheme.  This 
advantage  erodes  as  the  traffic  intensity  increases  until,  for  the  cases  where  pT  —  I,  the  TSP 
cost  becomes  unbounded  and  the  DS  policy  is  trivially  preferred.  Recall  that  the  percentage 
differences  in  Table  4  only  assess  the  inventory  costs,  and  the  DS  policy  will  always  incur 
lower  transportation  costs  than  the  TSP  policy.  Hence,  the  desired  policy  is  a  function  of 
the  transportation  cost  rate  r. 

The  entries  labeled  'Pred.'  represent  the  difference  in  inventory  costs  as  predicted  by 
the  heavy  traffic  analysis.  Table  4  shows  that  the  heavy  traffic  analysis  provides  reliable 
estimates  for  the  relative  performance  of  the  two  policies,  except  when  the  vehicle  size  is 
very  small.  In  the  one  case  where  the  prediction  errs  in  the  sign  of  the  percentage  difference 
(A  =  9,  V  =  10).  the  costs  for  both  policies  are  very  close.  The  predicted  differences  are 
larger  than  the  simulated  differences  in  6  of  the  9  cases,  including  the  {pr  =  0.9,  V  =  100) 
heavy  traffic  case;  this  discrepancy  may  be  due  to  the  fact  that  our  DS  estimates  are  based 
on  the  idealized  lluid  cycles,  and  hence  should  underestimate  the  true  heavy  traffic  cost 
under  DS. 

Experiment  4.  The  last  experiment  in  this  subsection  measures  the  increase  in  cost 
incurred  by  using  a  base  stock  level  different  from  the  one  proposed  in  the  heavy  traffic 
analysis.    We  already  have  the  required  data  for  this  analysis  from  the  exhaustive  search 
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PT  =  0.50 
PD  =  0.45 

PT  =  0.70 
PD  =  0.63 

PT  =  0.90 
PD  =  0.81 

PT  =  1.00 
PD  =  0.90 

V  =  100 

Sim. 

-80.2% 

-72.5% 

-29.1% 

N.A. 

Pred. 

-78.4% 

-71.8% 

-22.1% 

CXD 

\/  =  10 

Sim. 

-66.5% 

-58.6% 

-3.0% 

N.A. 

Pred. 

-76.7% 

-65.0% 

2.5% 

oo 

V  =  h 

Sim. 

-56.7% 

-64.5% 

12.6% 

N.A. 

Pred. 

-74.3% 

-57.8% 

25.2% 

oo 

Table  4:  Inventory  cost  comparison        p.q — -:  wedge  topology. 


performed  in  the  simulation  experiments  above.  Figure  2  plots  three  examples  of  the  cost 
increase  with  respect  to  the  proposed  policy,  as  a  function  of  the  base  stock  level  (expressed 
in  units  of  vehicle  size).  These  three  cases  correspond  to  the  DS  system  on  the  wedge 
topology  for  V  =  100  and  A  G  {5,  7,9}.  The  behavior  illustrated  here  is  typical  of  all  other 
instances  analyzed  in  our  simulation  experiments.  Three  characteristics  worth  noting  are: 
(i)  the  inventory  cost  is  convex  in  the  base  stock  level;  (ii)  the  cost  performance  remains 
relatively  constant  over  a  range  of  approximately  one  vehicle  size  around  the  optimal  base 
stock  level;  and  (iii)  once  the  base  stock  level  moves  beyond  this  range  in  either  direction 
the  cost  performance  deteriorates  rapidly. 

6.2      Dynamic  IRP 

An  Algorithmic  Solution.  In  an  attempt  to  understand  the  nature  of  the  optimal  solution 
to  the  dynamic  IRP,  we  pursue  a  computational  approach  to  problem  (64)-(65).  The  al- 
gorithmic procedure,  which  was  pioneered  by  Kushner  (1977),  approximates  the  diifusion 
process  by  a  discrete  time  and  space  Markov  chain,  and  then  numerically  solves  the  control 
problem  by  dynamic  programming.  Weak  convergence  methods  have  been  developed  to  ver- 
ify that  the  controlled  Markov  chain  and  its  optimal  cost  approximate  arbitrarily  closely  (at 
an  increased  computational  expense)  the  controlled  diffusion  process  and  its  optimal  cost. 
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Figure  2:  Sensitivity  of  inventory  cost  to  base  stock  level. 
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A  =  9.0 

Total 
Averaged 
Inventory 

l/  =  50 

r  =  500 

Idle  at 

101 

UseDS 

(61,101] 

Use  TSP 

(-20,61] 

UseDS 

(-00,-20] 

S' 

65.0% 

\/  =  10 

r  =  100 

Idle  at 

2.3 

UseDS 

(11,23] 

Use  TSP 

(-4,11] 

UseDS 

(-cx),-4] 

6" 

73.7% 

r  =  50 

Idle  at 

21 

UseDS 

(15,21] 

Use  TSP 

(-5,15] 

UseDS 

(-00,-5] 

6* 

46.2% 

Table  5:  Triple  threshold  policies  obtained  from  the  Markov  chain  approximation. 

Interested  readers  are  referred  to  Kushner  and  Dupuis  (1992)  for  a  recent  account  of  this 
research  area. 

We  developed  an  implementation  of  this  algorithm  and  solved  (64)-(65)  for  36  cases 
that  use  the  wedge  topology  and  the  symmetric  cost  structure  described  in  experiment  3; 
these  36  cases,  which  will  be  enumerated  later,  are  characterized  by  the  vehicle  capacity  V, 
the  demand  rate  A  and  the  transportation  cost  r.  For  brevity's  sake,  we  omit  a  description 
of  the  algorithm  and  refer  readers  to  Rubio  for  a  detailed  account  of  this  work.  (There  is 
an  error  in  Rubio's  description  of  the  implementation  of  the  algorithm;  in  the  improvement 
stage  of  the  policy  improvement  algorithm  on  page  119,  the  new  routing  scheme  3?fc+i(x)  is 
found  by  minimizing  the  gain  Y  in  (5-71)  while  keeping  z  and  3?(?/)  fixed  for  all  other  states 
y  ^  X,  not  by  minimizing  the  right  side  of  (5.68)  for  the  given  (V;t  (x),7J^).) 

The  numerical  results  are  consistent  with  our  conjectures  about  the  optimal  solution 
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to  (64)-(65).  In  three  of  the  36  cases  the  poHcy  generated  by  the  Markov  chain  approxima- 
tion algoritlim  is  of  the  triple  threshold  form;  in  the  remaining  33  cases,  the  solution  is  a 
degenerate  form  of  the  triple  threshold  policy  that  can  be  described  by  one  or  two  thresholds. 
Table  5  specifies  the  proposed  triple  threshold  policies  (in  terms  of  the  unsealed  inventory) 
for  the  three  cases,  along  with  the  proportion  of  time  that  DS  is  used.  As  expected,  the  TSP 
scheme  is  used  in  an  interval  containing  zero. 

The  Analytical  Double  Threshold  Policy.  Because  the  numerical  results  in  33  of  the 
36  cases  can  be  described  by  one  or  two  thresholds,  and  because  the  Markov  chain  procedure 
only  offers  an  approximate  solution  to  problem  (64)-(65)  (in  particular,  the  solution  is  not 
independent  of  the  heavy  traffic  parameter  n),  we  turn  to  an  analytical  derivation  of  the 
double  threshold  policy,  whore  the  vehicle  uses  DS  when  Z{t)  <  Zi,  uses  TSP  routing  when 
^(0  ^  [^i,  23)  and  idles  when  Z{t)  >  23.  The  parameters  Zj  and  Zj  are  derived  by  setting  22 
equal  to  Z3  in  the  results  in  §5.2;  in  particular,  the  function  F  in  (74)  must  be  minimized. 
For  the  symmetric  cost,  wedge  topology  cases,  this  function  can  take  eight  different  possible 
functional  forms,  and  a  steepest  descent  method  was  used  to  find  the  global  minimum  of 
this  function;  see  Appendix  B  of  Rubio  for  complete  details. 

riie  results  for  the  36  cases  are  presented  in  Table  6.  The  derived  double  threshold 
policy  is  degenerate  in  22  of  the  36  cases,  with  the  TSP  policy  optimal  in  16  cases  and  DS 
optimal  in  6  cases.  In  the  remaining  14  cases  where  two  thresholds  are  required,  the  value 
of  6'  is  often  very  close  to  zero,  suggesting  that  the  TSP  policy  is  close  to  optimal.  \\  liilc 
results  are  not  reported  here,  similar  findings  continued  to  hold  for  different  values  of  the 
backordering-to-holding  cost  ratio  b/h,  as  well  as  for  wider  or  narrower  wedges  (i.e.,  for  other 
values  of  601 /0 12)- 

For  the  33  cases  where  the  Markov  chain  procedure  generated  a  single  or  double 
threshold  solution,  these  solutions  matched  the  analytical  solutions  in  Table  6  very  closely; 
hence,  the  Markov  chain  solutions  are  not  included  hrrr.  For  the  three  cases  where  the 
Markov  chain  procedure  generated  a  triple  threshold  policy,  we  used  simulation  to  compare 
the  Markov  chain  policy  in  Table  5  to  the  analytically  derived  policy  in  Table  6.  Table  7  shows 
that  the  double  threshold  solution  out pci forms  the  policy  generated  i)y  the  Markov  chain 

46 


A  =  5.0 

A  =  7.0 

A  =  8.0 

A  =  9.0 

V  =  100 

r  =  500 

u;* 

20 

45 

75 

138 

wl 

-312 

-151 

-100 

-51 

6* 

0.0% 

0.0% 

1.3% 

8.5% 

r  =  100 

w; 

20 

45 

76 

141 

w; 

-385 

-168 

-111 

-58 

6* 

0.0% 

0.0% 

1.0% 

7.7% 

r  =  50 

W*3 

20 

45 

76 

142 

w; 

-404 

-171 

-112 

-59 

6* 

0.0% 

0.0% 

1.0% 

7.5% 

y  =  50 

r  =  500 

w; 

10 

23 

39 

112 

rv; 

-10^ 

-64 

-43 

112 

b* 

0.0% 

0.0% 

2.0% 

100.0% 

r  =  100 

w; 

10 

24 

39 

73 

"'i 

-202 

-80 

-53 

-28 

6* 

0.0% 

0.0% 

1.3% 

8.2% 

r  =  50 

zo; 

10 

24 

39 

73 

wl 

-202 

-82 

-54 

-29 

6* 

0.0% 

0.0% 

1.2% 

8.0% 

\/  =  10 

r  =  500 

lU* 

18 

20 

21 

24 

lu; 

18 

20 

21 

24 

6* 

100.0% 

100.0% 

100.0% 

100.0% 

r  =  100 

w; 

2 

6 

10 

24 

lul 

-24 

-11 

-8 

24 

6* 

0.0% 

0.0% 

3.9% 

100.0% 

r  =  50 

wl 

2 

6 

10 

18 

zvl 

-31 

-13 

-9 

-4 

6' 

0.0% 

0.0% 

3.3% 

11.9% 

Table  6:  The  analytical  double  threshold  policy. 
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r  — 
V  = 
\  = 

500 

50 

9.0 

100 

10 

9.0 

50 

10 

9.0 

(A):  Double  Threshold 

560.9 

116.9 

72.5 

(B):  Markov  Chain  Appr. 

596.6 

124.3 

85.6 

\^m% 

6.4% 

6.3% 

18.1% 

Table  7:  Cost  performance  of  Markov  approximation  method  vs.  double  threshold  policy. 

approximation  in  all  three  cases.  Although  not  reported  here,  the  predicted  improvement  in 
gain  from  the  Markov  chain  solution  versus  the  derived  double  threshold  policy  was  indeed 
relatively  small  for  all  three  cases.  Moreover,  heavy  traffic  conditions  (20)  and  (24)  are 
violated  in  the  18.1%  suboptirnality  case. 

Simulation  Study.  With  the  derived  double  threshold  policy  in  hand,  we  now  perform 
a  series  of  simulation  runs  for  the  dynamic  routing  IRP  to  gauge  the  accuracy  of  our  heavy 
traffic  approximations  over  a  range  of  values  for  the  problem  parameters.  We  set  r  =  500 
and  consider  a  total  of  six  cases  (3  traffic  intensities  and  2  vehicle  sizes),  using  the  same  five- 
retailer  wedge  system  that  was  presented  earlier.  In  our  simulation  runs,  we  inadvertently  set 
7/  equal  to  zero  in  equations  (75)-(76).  For  our  examples,  r/  =  V(40  —  41/)7)/80,  which  equals 
0.04V',  0.09V  and  O.HV  when  pr  equals  0.7,  0.8  and  0.9,  respectively.  Given  our  analysis 
in  Figure  2,  where  system  cost  remains  relatively  constant  over  a  range  of  approximately  V 
around  the  optimal  base  stock  level,  this  omission  is  inconsequential. 

The  entries  in  Table  8  represent  the  cost  increase  incurred  by  using  the  analytically 
derived  double  threshold  policy,  the  best  (i.e.,  base  stock  level  found  by  exhaustive  search) 
TSP  policy  or  the  best  DS  policy  instead  of  the  best  double  threshold  policy  found  by 
exhaustive  search  over  the  {wi,W3)  plane.  In  all  cases,  the  delivery  allocation  is  determined 
by  the  dynamic  rule  derived  from  the  heavy  traffic  optimal  cycle  placement.  The  derived 
double  threshold  |)()liry  porforms  very  well,  and  docs  not  seem  to  deteriorate  at  lower  traffic 
intensities  or  smaller  vehicle  sizes. 
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A  =  7.0 

A  =  8.0 

A  =  9.0 

V  =  100 

Prop. 

2.3% 

2.2% 

2.2% 

TSP* 

1.4% 

2.0% 

6.1% 

DS* 

42.5% 

27.6% 

10.6% 

y  =  50 

Prop. 

0.8% 

1.3% 

3.8% 

TSP* 

0.3% 

0.8% 

0.9% 

DS* 

18.4% 

11.8% 

3.3% 

Table  8:  Suboptimality  of  derived  double  threshold  and  fixed  routing  policies. 

A  glance  at  the  appropriate  entries  in  Table  6  shows  that  in  5  out  of  the  6  cases  in 
Table  8  the  value  of  6  is  close  to  either  0  or  1.  The  exception  is  the  (A  =  9,  F  =  100)  case, 
where  8  =  8.5%.  Table  8  confirms  that  the  best  double  threshold  policy  outperforms  either  of 
the  static  routing  schemes  in  this  case.  We  changed  A  while  leaving  everything  else  fixed  in  an 
attempt  to  find  a  case  where  the  advantage  of  the  dynamic  pohcy  would  be  more  dramatic. 
As  it  turns  out,  we  could  not  significantly  improve  over  the  (A  =  9,  V  =  100)  case;  in  the 
end,  the  analytically  derived  double  threshold  policy  was  at  most  5%  better  than  the  best 
fixed  routing  policy.  Furthermore,  in  the  cases  where  the  proposed  double  threshold  policy 
coincides  with  either  of  the  fixed  routing  schemes,  the  cost  increase  incurred  by  choosing  the 
wrong  fixed  routing  scheme  is  quite  significant  (higher  than  10%  in  all  5  cases,  and  up  to 
43%  for  A  =  7,  V  =  100).  These  numbers  suggest  that,  while  finding  the  best  fixed  routing 
scheme  is  very  important,  the  advantage  obtained  from  dynamic  routing  is  quite  small  in 
most  problem  instances. 

7      Summary  and  Conclusions 

The  IRP  is  one  of  the  more  challenging  problems  in  operations  research,  especially  when 
considered  from  a  dynamic  and  stochastic  viewpoint.  We  focus  on  the  operational  aspects 
of  the  problem  and  consider  a  system  with  a  single  capacitated- vehicle -that  operates  out  of 
a  single  warehouse  and  services  a  finite  set  of  retailers.    By  restricting  an  outgoing  vehicle 
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to  deliver  full  loads  to  either  a  single  retailer  (direct  shipping  or  DS)  or  along  a  prespecified 
(TSP)  tour,  we  avoid  the  combinatorial  complexities  inherent  in  the  problem  and  maintain  a 
sharp  focus  on  the  crucial  tradeoff  between  inventory  costs  and  transportation  costs  that  lies 
at  the  heart  of  the  IK  P.  Our  modeling  of  the  dynamic  stochastic  IRP  as  a  queueing  control 
problem  offers  a  new  perspective  on  the  problem:  rather  than  view  the  IRP  as  a  variant  of 
the  vehicle  routing  problem,  we  see  it  as  a  variant  of  a  production/inventory  control  problem 
(where  the  capacitated  vehicle  plays  the  role  of  the  production  system);  a^  such,  this  paper 
is  a  natural  descendant  of  Wein  and  Markowitz,  Rciman  and  Wein,  which  consider  more 
conventional  production/inventory  control  problems. 

By  assuming  that  the  system  is  operating  in  the  (suitably  defined)  heavy  traffic  regime, 
we  approximate  the  queueing  control  problem  by  a  diffusion  control  problem.  When  only 
TSP  tours  are  allowed,  this  modeling  approach,  together  with  the  application  of  a  heavy 
traffic  time  scale  decomposition,  allows  us  to  fully  characterize  the  solution  to  the  diffusion 
control  problem,  thereby  generating  an  operating  policy  for  the  original  system.  By  assuming 
the  existence  of  a  fixed  sequence  of  retailer  visits  than  can  achieve  constant  inter- deli  very 
times  to  each  retailer  in  the  fluid  limit,  we  perform  a  similar  analysis  for  the  DS  case.  The 
control  policy  in  both  cases  is  characterized  by  a  vehicle  idling  policy,  which  dictates  whether 
a  vehicle  at  the  warehouse  should  sit  idle  or  set  out  with  a  full  load,  and  a  dynamic  allocation 
policy,  which  specifies  how  many  units  to  leave  off  at  each  retailer  under  a  TSP  scheme,  and 
which  retailer  to  visit  next  in  the  DS  scheme. 

We  also  consider  the  case  where  dynamic  route  selection  (either  TSP  or  DS)  is  allowed. 
The  diffusion  control  problem  is  solved  numerically  and  a  class  of  triple  threshold  policies 
is  analyzed.  Finally,  a  series  of  simulation  studies  is  performed  to  complement  the  heavy 
traffic  analysis. 

Our  key  findings  can  be  summarized  as  follows: 

•  The  inventory  component  of  the  total  long  run  average  cost  depends  on  the  stochastic 
characteristics  of  the  system,  while  the  transportation  component  for  a  fixed  routing 
scheme  is  determined  solely  from  first  moment  information. 
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•  The  vehicle  idhng  poHcy  is  characterized  by  an  aggregate  base  stock  level:  the  vehicle 
idles  at  the  warehouse  whenever  the  total  retailer  inventory  exceeds  a  certain  threshold 
level.  The  value  of  the  optimal  base  stock  level  is  independent  of  the  transportation  cost 
rate.  Although  the  existing  IRP  literature  does  not  typically  address  the  vehicle  idling 
issue,  our  simulation  results  show  that  system  performance  is  quite  sensitive  to  the 
value  of  the  base  stock  level,  deteriorating  rapidly  when  the  base  stock  level  differs  from 
the  optimal  value  by  more  than  the  vehicle  capacity.  Moreover,  simulation  results  also 
confirm  that  the  system  cost  under  our  derived  base  stock  levels  are  typically  within 
several  percent  of  the  cost  achieved  by  the  best  (found  by  exhaustive  search  using 
simulation)  base  stock  level,  unless  the  heavy  traffic  conditions  are  grossly  violated 
(e.g.,  traffic  intensity  equals  0.5  and  vehicle  capacity  is  less  than  or  equal  to  10  units). 

•  The  allocation  of  load  among  the  retailers  is  dictated  by  the  desire  to  concentrate  most 
of  the  total  inventory  (backorders)  at  the  site  with  the  smallest  holding  (backorder) 
cost  rate. 

•  Dynamic  (i.e.,  state-dependent  or  closed-loop)  delivery  allocations  greatly  outperform 
their  static  (state-independent  or  open-loop)  counterparts  in  a  stochastic  environment. 
In  fact,  central  limit  theorem  arguments  indicate  that  static  delivery  allocations  cause 
the  absolute  value  of  the  inventory  levels  to  grow  as  the  square  root  of  time,  thereby 
leading  to  unbounded  costs  over  the  long  run. 

•  The  relative  advantage  of  recalculating  the  load  allocation  at  each  retailer  within  a 
TSP  tour,  as  opposed  to  setting  it  once  at  the  beginning  of  each  tour,  decreases  as 
utilization  increases,  and  vanishes  in  the  heavy  traffic  limit. 

•  The  policy  that  achieves  the  lowest  transportation  cost  is  the  one  that  delivers  the 
largest  amount  per  unit  time  travelled  (subject  to  meeting  average  demand).  There- 
fore, direct  shipping  is  the  most  transportation-efficient  routing  scheme;  although  this 
fact  is  a  trivial  consequence  of  the  triangle  inequality,  it  is  perhaps  our  most  important 
observation.  This  fact  helps  highlight  the  basic  cost  tradeoff  in  the  IRP:  DS  leads  to 
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smaller  transportation  costs,  hut  TSP  routing,  by  making  smaller  and  more  frequent 
deliveries,  may  lead  to  smaller  inventory  costs.  This  result  also  implies  that  DS  has 
a  larger  stability  region  than  TSP;  that  is,  for  any  given  problem  instance,  one  can 
increase  the  demand  rates  to  a  level  where  the  TSP  routing  scheme  has  a  traffic  in- 
tensity greater  than  or  equal  to  one  and  the  DS  scheme  has  an  intensity  less  than  one. 
Moreover,  our  analysis  highlights  the  danger  in  employing  a  myopic  policy  that  always 
minimizes  current  inventory  costs  (for  example,  have  each  outgoing  vehicle  satisfy  «is 
many  backorders  as  possible,  regardless  of  location);  such  a  policy,  which  is  similar 
in  spirit  (and  consequence)  to  a  production  lot-sizing  policy  that  frequently  breaks  a 
setup  in  order  to  satisfy  backordered  demand,  can  easily  become  unstable. 

•  Heavy  traffic  analysis  shows  that  for  cost-symmetric  systems  with  sufficiently  high 
backorder  costs,  DS  will  be  preferred  to  TSP  routing. 

•  Simulation  results  show  that  there  is  often  a  large  difference  in  performance  between  the 
DS  and  TSP  policies.  Although  the  traffic  intensity,  backorder  costs  and  transportation 
costs  all  play  a  significant  role,  the  topology  probably  plays  the  largest  role  in  the 
relative  attractiveness  of  each  policy.  For  systems  with  relatively  high  loads,  it  appears 
that  TSP  could  only  be  a  desirable  alternative  when  the  tour  is  wedge-shaped,  as  is 
often  the  case  in  practice. 

•  The  heavy  traffic  analysis  accurately  predicts  the  relative  cost  of  using  the  fixed  DS 
or  fixed  TSP  schemes.  Hence,  our  procedure  can  be  used  as  an  aid  in  higher  level 
decisions,  as  discussed  at  the  end  of  this  paper. 

•  If  one  can  dynamically  choose  between  the  DS  and  TSP  options,  we  conjecture  that 
the  most  general  form  of  the  solution  is  a  triple  threshold  policy  characterized  by 
Wi  <  W2  <  Wj-.  if  the  total  retailer  inventory  is  less  than  wi  then  DS  is  preferred,  if 
it  is  in  the  interval  [101,102)  then  TSP  is  preferred  and  if  it  is  in  the  interval  [102,^^3)1 
where  W3  is  the  idling  threshold,  then  DS  is  preferred.  Our  rationale  is  as  follows:  if 
the  absolute  value  of  the  total  retailer  inventory  is  large  then  the  routing  scheme  may 
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have  little  effect  on  the  rate  at  which  inventory  costs  are  incurred.  In  tlicse  cases, 
DS  may  be  preferable  because  it  incurs  smaller  transportation  costs.  In  addition,  the 
-efficiency  of  DS  has  a  tendency  to  increase  the  total  inventory,  level  relative  to  TSP  (in 
the  diffusion  control  problem  the  DS  option  has  a  larger  drift  than  the  TSP  option), 
and  so  DS  will  be  even  more  attractive  when  the  total  inventory  is  much  less  than 
zero,  as  it  will  help  to  decrease  future  backorders.  However,  when  the  total  inventory 
is  in  the  interval  [iui,W2),  which  should  contain  zero  in  the  nondegenerate  case,  the 
frequent  deliveries  of  TSP  lead  to  less  backorders  and  smaller  inventory  costs,  making 
it  the  more  attractive  alternative.  Finally,  because  the  effective  penalty  for  using  the 
inefficient  TSP  policy  decreases  when  the  total  inventory  is  large,  we  believe  that  in 
most  cases  the  optimal  solution  will  be  no  more  complex  than  a  double  threshold 
pohcy,  where  W2  =  W3.  This  state  of  affairs  is  somewhat  analogous  to  the  stochastic 
ELSP  with  setup  times  analyzed  in  Markowitz,  Reiman  and  Wein,  where  large  (small) 
lot  sizes  correspond  to  DS  (TSP). 

•  We  computed  the  numerical  solution  to  the  diffusion  control  problem  corresponding 
to  the  dynamic  IRP  for  a  number  of  instances,  and  the  results  were  consistent  with 
our  conjectures:  the  most  general  optimal  policy  was  of  the  triple  threshold  form,  and 
in  most  cases  a  degenerate  form  of  the  pohcy  was  optimal:  either  the  fixed  DS  case 
{wi  =  i«2  =  —00  or  lOi  =  W2  =  W3),  the  fixed  TSP  case  {ivi  =  —00,  W2  =  w^)  or  the 
double  threshold  policy  (102  =  ^3)-  Moreover,  in  our  limited  simulation  experiments, 
we  did  not  find  a  numerically  computed  triple  threshold  policy  that  outperformed 
the  analytically  derived  double  threshold  policy  (although  we  did  not  search  beyond 
the  computed  triple  threshold  values).  By  performing  many  exhaustive  searches  using 
simulation,  we  also  found  that  the  best  double  threshold  policy  differed  from  the  better 
of  the  two  fixed  routing  policies  in  only  a  narrow  range  of  system  parameter  space;  in 
this  range,  the  best  TSP  and  DS  policies  achieve  fairly  similar  performance.  Hence, 
coupling  this  observation  with  a  previous  one  suggests  that  finding  the  best  fixed 
route  policy  is  very  important  while  allowing  for  dynamic  routing  provides  a  much 
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less  substantial  benefit;  this  is  particularly  true  in  light  of  the  increased  complexity  of 
implementing  a  dynamic  routing  scheme. 

In  summary,  the  important  operational  levers  for  the  IRP  iiulude  the  aggregate  base  stock 
level,  the  dynamic  allocation  policy  and  the  choice  of  fixed  routing  scheme,  but  not  the  dy- 
namic routing  policy.  Moreover,  these  key  decisions  are  interrelated  and  a  unified  stochastic 
control  model,  such  as  the  one  considered  here,  is  required  for  achieving  reliable  system 
performance. 

Two  topics  for  future  research  naturally  come  to  mind.  The  first  is  to  extend  the 
dynamic  routing  scheme  so  as  to  allow  K  different  types  of  routes  (where  K  >  2)  and/or 
to  consider  cyclic  routes  that  use  a  combination  of  DS  and  TSP  (e.g.,  a  cycle  could  consist 
of  a  TSP  tour  through  xetailers  1,  2  and  3,  followed  by  a  direct  shipment  to  retailer  2). 
Although  in  theory  these  extensions  could  be  incorporated  and  system  improvements  could 
be  achieved,  the  analysis  would  be  tedious  and  it  is  doubtful  that  any  additional  insights 
would  be  found. 

Perhaps  the  most  fruitful  area  for  future  research  would  be  to  develop  the  necessary 
steps  for  a  hierarchical  approach  to  the  general  (multi-vehicle,  multi-warehouse)  IRP;  such 
an  approach  would  be  similar  in  spirit  to  the  vehicle  routing  analysis  performed  by  Simchi- 
Levi,  but  would  also  incorporate  the  inventory  cost  component.  Our  results  for  fixed  route 
policies  provide  estimates  for  the  operating  cost  for  any  system  given  a  particular  assignment 
of  retailers  to  vehicles  and  vehicles  to  warehouses.  Motivated  by  our  observation  that  the 
best  fixed  route  policy  performs  nearly  as  well  as  the  best  dynamic  policy  over  a  broad 
range  of  parameters,  the  first  level  up  in  the  hierarchy  could  implrmont  some  interexchange 
optimization  algorithm  (e.g.  a  k-opt  algorithm  as  used  in  the  detenninistic  vehicle  routing 
literature)  to  find  the  best  such  route.  Higher  levels  in  the  hierarchy  could  then  be  used  to 
select  the  best  possible  assignment  of  vehicles  and  retailers,  and  the  total  number  of  vehicles 
to  have  in  the  system.  At  an  even  higher  level,  these  results  could  be  used  to  decide  on  the 
number  and  location  of  warehouses. 
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